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ABSTRACT 

We  assume  that  p  random  variables,  y|,«o.,y  ,   are  distributed 
according  to  some  multivariate  normal  distribution  (called  the  p 
variate  normal).  Methods  of  predicting  the  value  of  one,  say,  y  , 
given  the  values  of  the  other  p-l  variables  are  discussed,  A  study 
is  made  of  the  problems  encountered  whenever  one  tries  to  reduce  the 
number  of  variables  used  to  predict  y  and  at  the  same  time  minimize 
loss  in  prediction  accuracy.  Modifications  of  the  step-wise  proce- 
dure of  adding  predictor  variables  one  at  a  time  are  considered  in 
some  detail,  and  methods  of  using  an  automatic  high  speed  electronic 
computer  to  perform  the  numerous  calculations  involved  are  described, 
A  high  speed  computer  program  was  written  to  generate  samples  from 
any  specified  p  variate  normal, 
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concepts  used  in  this  paper,  and  as  faculty  adviser  provided  the 
guidance  necessary  to  apply  these  concepts;  and  to  Mrs,  Bette  Joe, 
for  her  most  capable  typing  of  this  paper. 
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Chapter  I 
INTRODUCTION 

The  multivariate  nonnal  distribution  with  p  variables,  referred 
to  here  as  "the  p  variate  normal"  has  been  found  to  be  useful  as  a 
model  for  a  wide  variety  of  real  world  phenomena.   This  distribu- 
tion has  been  studied  intensely  in  the  literature  and  has  many  "nice' 
mathematical  properties. 

One  of  the  p  variate  normal's  most  useful  properties  is  the 
fact  that  when  q  of  the  variables  are  fixed,  the  remaining  p-q 
variables  become  a  p-q  variate  normal,  which  has  the  same  variance- 
covariance  matrix  regardless  of  the  actual  fixed  values  of  the  first 
q  variables.  Where  q  equals  p-l  the  variable  whose  value  is  not 
fixed,  say  y  ,  becomes  a  conditional  normal  random  variable  whose 
variance  is  less  than  the  variance  of  y  when  the  variables 
y.,,..,y   .  are  not  fixed. 

In  chapter  II,  methods  of  "predicting"  y  from  known  fixed 
values  of  the  other  p-l  variables  are  described,  and  methods  of 
measuring  the  accuracy  of  prediction  in  terms  of  variance  of  y 
are  given.  These  methods  require  that  the  p  variate  normal  be 
specified  completely  by  a  mean  vector,  U,  and  a  varlance-covariance 
(V-C)  matrix,  2^,   In  chapter  III,  methods  of  approximating  the 
work  of  chapter  II  using  sample  estimates  of  U  and  2_.  are  described. 
These  ideas  are  illustrated  by  an  example  in  chapter  IV, 

After  mastering  the  technique  of  regressing  p-l  variables  to 
form  a  prediction  equation  for  the  last  one,  y  ,  we  turn  to  the 


problem  of  eliminating  variables  that  may  not  be  useful  in  predicting 
the  value  of  y_.  Variables  are  eliminated  by  removing  all  reference 
to  them  before  the  prediction  equation  is  computed.  Reasons  for  re- 
ducing the  number  of  variables  in  regression  are  presented  in  chapter 
V,  Later  in  chapter  V,  the  process  of  eliminating  variables  from 
regression  is  illustrated  by  an  example  using  a  specified  fivevariate 
no  rma  I  • 

At  present,  the  only  known  way  to  find  the  "optimum  set"  of      y^ 
r  (r*  p-l)  variables  is  to  compute  all  y'^A     regressions.  Obviously 
this  involves  extremely  large  numbers  of  computations  for  large 
p,  so  that  methods  involving  fewer  computations  are  normally  used. 
Generally  these  faster  methods  produce  "good"  combinations  of  vari= 
ables  in  regression  but  often  they  are  not  the  "optimum"  combination 
for  the  same  number  of  variables  in  regression. 

Chapters  VI  through  IX  discuss  methods  of  searching  for  a 
satisfactorily  small  set  of  variables  in  regression  that  will  reduce 
the  conditional  variance  of  y  to  a  satisfactory  level.  The  step- 
wise procedure,  described  in  chapter  VI,  provides  the  basic  proce» 
dure  under  study  throughout  the  rest  of  the  paper.  Basically,  this 
procedure  consists  of  adding  variables  to  regression  in  steps.  At 
each  step,  the  variable  to  be  added  is  selected  because  its  contri- 
bution to  variance  reduction  is  greatest  at  this  step.  That  this 
procedure  does  not  always  produce  optimal  combinations  of  variables 
in  regression  Is  demonstrated. 

Also  in  chapter  VI  a  statistical  test  to  be  applied  at  each 
step  when  a  sample  is  being  studied  is  described.   This  test  provides 


a  criterion  for  halting  the  step-wise  process  which  is  a  function 
of  sample  size,  n. 

In  chapter  VII  automatic  regression  analysis  performance  by  a 
high  speed  digital  computer  is  discussed.  Additional  halting 
criteria  and  other  improvements  to  the  step-wise  procedure  are 
suggested.  Halt  criteria  proposed  by  Miller  7  and  Efroymson  Ijl 
are  reviewed  in  light  of  automatic  regression  analysis  requirements, 
A  modification  to  the  step-wise  procedure  reflecting  differences  in 
cost  of  observation  of  variables  is  considered. 

In  chapter  VIII  computer  programs  MV  REGRESSION  and  MV  SIM, 
written  by  the  author,  are  presented.  Basically,  MV  SIM  generates 
samples  of  a  specified  size,  n,  from  a  given  p  variate  normal  with 
which  MV  REGRESSION  performs  regression  analyses,  MV  SIM  also 
computes  regression  parameters  of  the  given  p  variate  normal,  the 
results  of  which  may  be  used  as  standard  for  comparison  purposes 
with  results  of  regression  analysis  of  the  samples. 

In  chapter  IX  current  and  proposed  studies  using  these  high 
speed  computer  programs  are  outlined. 

Appendix  A  describes  the  operation  of  program  MV  SIM  in  detail 
and  some  background  on  the  techniques  used  by  MV  SIM  to  generate 
samples  from  specified  p  variate  nonnals. 

Appendix  B  describes  statistical  tests  performed  by  MV  SIM 
on  sample  vectors,  Z,  and  sample  (V-C)  matrices,  S,   Results  of 
tests  perfonmed  on  a  number  of  generated  samples  of  different  sizes 
of  a  five  variate  normal  and  an  18  variate  normal  are  given. 


Chapter  I  I 
THE   P  VARIATE   NORML   DISTRIBUTION 

In   this  chapter  we  introduce  the  multivariate  normal   distribu- 
tion with  p  variables,   hereinafter  called   the  "p  variate  normal^o 
The  basic  theory  associated  with   the  p  variate  normal    is  given    in 
detail    by  Graybi  I  I    li+J   and  Anderson      I    •     Certain   theorems  and 
formulas   that  are   important   for   later  work  on   regression  analysis 
are  given  here, 

A  p  variate  normal    is  completely  defined  by  any  specified 
pxl    vector  of   means,   U,   and  any   pxp  positive  definite  symmetric 
variance  -  covariance  (V-C)   matrix,  2- •      Lets  ,    5^ '^  »_> 

.-('1    "■("■).  E  ■(?"■"?" 

The  joint  density  function  of  the  p  variate  normal,  Y,  is  given 
by: 

,„,>,,         ^   I  -  l/2(Y-U)  Z    (Y.U) 

for  -  oo  *:  y|  -E  CO  ,  i  =  I,,,,, p. 

The  element  C/]   :  of  2-  is  the  covariance  between  variables  yj 
and  y  j,  and  u.  of  U  is  the  mean  of  y., 

si 

If  the  pxl  vector  Y  is  partitioned  into  two  subvectors  such 
that: 


^1 


>Y2. 


,    (vectors  Y.    and  Yp  are   (p~q)xl    and  qxl    respect! ve J yi, 


and   if; 

U.(     'I    and      E=(^ 

•21      ^22' 

are  the  corresponding  partitions  of  U  and  }_,  ,   then  it  can  be  shown, 

li+J  section  3»6,  that  the  conditional  distribution  of  the  qxl  vector 

Yg  given  the  vector  Y.  =  Y.*  (a  constant  vector),  Yp|Y.*,  is  the 
multivariate  normal  distribution  with  qxl  mean  vector 

-Tt 

'21  *-|l  ^'  I   "  "I 


Uo  +  Zo.  I'll    (Y,*  -  U,), 


and  qxq  V-C  matrix 


I   -  7   7"'  r 

^a2   ^21  ^1  I  ^12* 


From  the  latter  matrix  we  see  the  important  fact  that  the  co<= 
variance  matrix  of  the  conditional  random  vector  Y^lY  ♦  does  not  de- 
pend  upon  the  value  of  Y  *, 

We  shall  represent  the  qxq  V-C  matrix  of  YplY.*  ass 

(2.2)  Ilgg^,  =  JI22  -  Z2I  ^11  ^12  " 


In  particular,  each  element,  07i  1     ,^  «»  o^  ^'his  matrix 

I  I  o  I  ,  ••  a  iP^q 

(i  »  p-q+l,,,,,p)  is  the  conditional  variance  of  variable  y.  in  Y^, 
i  .e*  the  variance  of  y.  when  the  p-q:  variables  in  Y.  are  fixed. 
The  element  O^ .    in  the  specified  V-C  matrix,  ^,  is  the  variance 


of  yt  in  the  original  p  variate  normal  distribution.  That  OT.  is 
greater  than  or  equal  to  C^.    .  follows  from  formula  2o2 

above,  and  the  fact  that  L.o\    Z-m  Z-ip  '^  positive  definiteo   In 
fact,  the  following  relationship  holds  (where  0  —  R|  —  l)s 

^i, .p-q-(i-R?)  cr.^  . 


In  this  formula  R.  is  the  multiple  correlation  coefficient  between 
variable  y.  and  vector  Ypj  see  li+J  section  3o6» 

In  this  paper  we  will  consider  only  the  case  where  qi  =  I » 

Now  Y  »   '],  where  Y  is  still  pxl,  Y^  is  (p-l)xl,  and  Y  is  the 

variable  y  .  Similarly,  we  partition  Y*  ^ly  )»  ^»  ^"^^    Z- so  that 

elements  Yp,  Up,  and  1^22  ^^^'^^^  Yn»    %»  ^^^    O^n    respect ively. 

It  follows  from  earlier  discussions  that  the  distribution  of 
yplYj*  is  the  univariate  normal  distribution  with  (scalar)  mean; 

and  scalar  variance: 

(2.2+)        ^p.l,...,p-l  "   i-22  "  ^21  ^11  ^12 

=  O^p  -  L21  2-11  2-12  " 


\ 


Let^J    be  the   (p-l)xl   vecfor  I    /J       I    =   (   2-21     ^||)    •     ^ 

formula  2,3*   we  can  write: 

(2.5)        y >,*  »  u     +      /}(Y,*  -  U,)    +  e 


rom 


X- 


p.  I 

D  r"*.  /^  I 


,         (y.*  -  u. )    +  e 


p-l  p-l 

P  ' 


i=l  1=1 


ri         *     yi*\ 

where  UjisstillM  »''^|.  »   ^"<^  ®   '^  ^  norma  My  distri- 

\Vi/         V*P->I 

buted  random  variable  with  mean  zero.  The  variance  of  e  is 

rr      ,  ,,  the  value  of  which  is  independent  of  the  actual  values 

^pp.l,...,p-l  ^ 

We  define  formula  2,5  as  the  prediction  equation  for  ^  associated 
with  the  p  variate  normal.  Most  often  we  shall  use  it  in  the  forms 

p-l 
(2.6)   VplY,*  -  e  =  E(yp|Y,*)  =  Up  +  I  /Jj.(y.*  -  Uj), 

Now,  if  we  know  the  fixed  values  of  Yj*  (in  addition  to  U  and  2- ) 
we  can  use  2,6  to  compute  the  mean  of  the  conditional  random 
variable  yplYj*,  A  measure  of  the  "error"  involved  in  using  the 
results  of  2,6  to  "predict"  the  value  of  y  ,  when  Yj*  is  known,  is 


given  by  Qr      ,       , ,  By  comparison,  if  the  value  of  Y  *  Is  not 
pp •!,•••, p~ I  i 

known,  one  might  use  the  original  mean  of  y  ,  u  ,  to  "predict"  the 

value  of  y  ,  The  corresponding  "error"  of  this  prediction  is  given 

t>y  Cr„.  which  is  greater  than  Ol^    i     ^  i.   The  values  of  the 
'   PP  ^  ^pp»l,...,p-l 

scalars,  Uj\»    in  vector  Jj  are   called  partial  regression  coef  f  icientSo 

Suppose  the  computed  values  of  some  of  the  partial  regression 
coefficients  hJ\»  fji,*   etc...  are  zero,  or  close  to  zero.  Then,, 
obviously,  insofar  as  estimating  y  is  concerned,  one  can  save  the 
effort  and  cost  of  observing  the  values  of  y  .,  y.  , 

It  often  happens,  especially  when  the  number  of  variables, 
p,  is  large,  that  some  of  the  variables,  themselves,  can  be  pre- 
dicted rather  accurately  by  a  linear  combination  of  other  variablese 
This  shows  that  even  if  none  of  the  partial  regression  coefficients 
are  close  to  zero,  it  may  be  possible  to  observe  only  a  select  few 
of  the  variables  and  still  predict  y  nearly  as  accurately  as  when 
all  of  the  variables  are  used. 

Of  course,  the  values  of  the  partial  regression  coefficients 
to  be  used  with  each  variable  depend  upon  which  other  variables 
are  used  in  combination  to  predict  y  ,  Throughout  this  paper,  any 
combination  of  the  original  p-l  variables  that  are  used  to  predict 
y  in  the  manner  just  described  wi  I  I  be  said  to  be  "in  regression"o 
Ttie  variables  whose  values  are  not  to  be  used  to  predict  y-  we  shall 
say  are  "not  in  regression". 

Once  a  combination  of  variables  to  be  in  regression  have  been 
chosen,  a  modified  mean  vector  U  and  V-C  matrix  2-  Qi"©  formed  from 
the  original  U  and  l_    respectively  by  removing  the  u.  from  U  and 
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0~-  '   and  C7.  (for  all  i  and  k)  from  2_.  for   each  variable  y?  that 
I J       J*^  J 

is  to  be  "not  in  regression",  (If  q  of  the  p-l  variables 
yi#»»»*y   I  a*"©  to  be  "not  in  regression",  then  U  is  (p-q)xl  and 
2j  is  (p-q)x(p-q)).  Thus,  we  see  that  all  reference  to  those 
variables  not  in  regression  is  completely  removed  and  a  new  p-q 
variate  normal  is  defined  by  U  and  Z*  ,  from  which  new  prediction 
equations  (2,5  or   2,6)  can  be  computed.  Note  that  it  is  possible 

p-l 
to  make  up  2-,    C^  ;  )  prediction  equations  for  predicting  variable 

y  ,  one  for  each  possible  combination  of  variables  yi9«»«jy_,  ■• 
p  I      p- 1 

In  chapters  which  fol low  we  wi  1 1  discuss  methods  of  estimating 

the  partial  regression  coef  f  icients,  )U;,  and  C~      .      *  etc, 

/  '        pp • I , , •  •  ,q 

when  the  values  of  U  and  2-  are  not  known.  Methods  of  choosing 

which  variables  are  "best"  to  use  in  regression  will  be  discussed. 

We  shall  also  consider  the  problem  of  specifying  relative  "cost" 

of  observation  per  unit  reduction  in  Q-      ,      , 
f'  ^pp.l,,.,,q* 


Chapter  I  I  I 
STATISTICAL  ANALYSIS  OF  THE  P  VARIATE  NORMAL 

In  this  chapter  we  assume  that  Y  has  a  p-variate  normal  distri- 
bution with  unknown  mean  vector  U  and  V-C  matrix  2^  e     We  are  now 
concerned  with  methods  by  which  an  experimenter  can  estimate  U  and 
2-,  and  subsequently,  other  parameters,  such  as  regression  coeffi^ 
cients  for  prediction  equations  for  predicting  y  j  and  Ql      |     _, 
the  conditional  variance  of  yplYp*  when  variables  y,,,»,,y„   are  in 
regression.   In  order  to  distinguish  estimates  of  parameters  from 
their  associated  theoretical  values,  it  is  convenient  to  develop  new 
notation  to  be  used  throughout  this  paper,  listed  here  for  easy 
reference: 

TABLE  I 

Notation  for 
Notation  for                            Associated  Estimated 
Theoretical  Values     Meaning  of  Parameter    Parameters 

U  pxl  mean  vector  of  the  p  Z 

variate  normal 

Y  pxp  V-C  matrix  of  the  p  S 

variate  normal 

/J  (beta)      qxl  vector  of  regression  B 

coefficients  associated  with 
q  vectors  in  regression 

C^  The  element  in  row  p,  column        s 

pp  ^  ^*  pp 

p  of  2- »  which  is  the  (uncon- 
ditional )  variance  of  y-,  y. 

is  arbitrarily  chosen  to  be 
the  variable  to  be  predicted. 

C^     ,  ^  The  conditional  variance  of      s^^  , 

'-'pp.l,...,q      Is,  *    u    x/  *  •       I        pp.l,..o,q 
VpM  I  ,   where  Y|   is  a  qxl 

vector  of  fixed  values  of 
Vl'^'-VVq 

10 


A  sample  of  size  n  can  be  arranged  into  nxp  matrix  form 
as  fol lows: 

r  vii'  yi2 ^ip 

^21*  ^22*  •••'  y2p 


J' 


^^nl'  yn2 


..o  y  * 
'np 


where  y  ..  represents  the  j  th  observation  of  variable  y..  Note 

that  for  this  sample,  observations  of  y  ,  the  variable  later  to 

/ 
be  predicted,  are  also  required.  Sample  means  are  computed  ass 


i 


Vi      = 


,    for  i    =    I,   2, 


e  •  «  ,      \>  o 


Sample  covariances  ass 


St 


L    (Vji    -  Yi 


)    (/jk  -  Vk) 


ik      = 


n  -   I 


for  I,   k  =   I,    ,..,   p. 

For  i  =  k  the  sample  covariances  become  the  sample  variances 


Z-  (y  il  -  Vl 


i  i 


n  -  I 


By  analogy  to  the  mean  vector  U  and  V-C  matrix,  Z-  #  we  fonn 
the  pxl  sample  mean  vector,  Z,  and  sample  V-C  matrix,  S,  as  fol lows: 


Vl 


z  = 


S  = 


hy^ 


I  I  S|2 


'pl 


Ip 


PP 


If 


It  can  be  verified  easily  that  Z  and  S  are  unbiased  estimates 
of  U  and  2-  respectively;  and  that  Z  and  (n-l)/n  •$  are  maximum 
I i  ke I i  hood  estimates  of  U  and  2-  • 

To  develop  estimates  of  the  parameters  of  the  conditional  dis- 
tribution of  yplVj*  we  recall  that  the  random  variable  YplYj*  is 
normally  distributed  with  mean  and  variance  given  by  equations  2«3 
and  2.4,  We  partition  Y,  Y*,  Z,  S,  as  we  did  Y,  Y*,  U,  and  J  » 
respectively  in  Chapter  II: 


where,  as  before? 


"^1  'I      'i*  ^1*  '  I  ''  I  (constant  vector) 

iVi/         \yp!i 


and. 


p-l  p 


Since  z:  and  ^""'-^  •  s;  •   are  maximum  likelihood  estimates  of  u. 
'       n      '  J  ' 

and  Cy j  respectively,  for  i,  j  =  I,  ••.,  p,  it  follows  from  the  in- 
variant property  of  maximum  likelihood  estimates  that: 

^p  *  ^2\    sT!  (Yt  -  Z,),  and  [spp  -  S^,  ^]\   S^^}^ 

are  maximum    I  i  ke  I  i  hood  estimates  of   u     ^aLoj    ^11    ^^1   **"|)»   ^""^ 
(^p  -  Z21    ^11    ^12     respectively. 
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Let: 

and, 

(3.2)  Spp.l,..,,p-I  =  Spp  -  S2,  S,,  S,2  •  —J  ^ 

It  can  be  shown  that  Zp^|  ^^^  p_|  and  s   |  ^^^    |  are 

unbiased  estimates  of  u>,  1     -1,  and  C7;:_  1     „  1,  the  mean 
I      p«i,»«,,p-i'     ^^  pp« I # • •• ,p-i ' 

and  variance  of  the  conditional  random  variable  VqIYi   respectively. 

I     "^\  -IT 

SImi larly,  i  f  we  let  B  »  I  '   I  =  (Sgi  Sj  j)  ,  B  is  a  maximum 

rp-i  / 

_,    T 
likelihood  estimator  of  ^L?    »   (Z^oi  1-  1  1 )    »    ^^^^   's,bj    Is  a  M.L.E.   of 

JJ-    for  i    e   l»***>p-l«      It  can  be  shown   that  B   is  also  an  unbiased 

estimator  of  /J, 

We  can  write  B   in   the  form   (since  Sji    is   positive  definite): 

(3.3)  B  =  SjI   S,2; 

.•^1^         l^\    I    •••  ^1   p-l       \     /      ^Ip 

i°p-.i/       \Sp.|  r»'Sp_,  p_,  /    \Sp_,  p,  , 

Equations  3«3  ^i"®  called  the  normal  equations. 

Substituting  z^  1     _  1  for  u_  1     «  1 »  we  obtain  an  un- 
^  p.l»««»,p-l      p. !,•••, p-l*  — 

biased  estimate  for  the  value  of  the  prediction  equation,  2,6,  by: 


^p.l,...,p-l 

^p  *^2\   sTI  (yI-  2i) 


i 


p-l 

Z-  b,  (y.  -  z,). 


Zp  *  .^1  b,  (y, 
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Chapter  IV 
AN  EXAMPLE 

Assume  that  an  experimenter  wishes  to  gather  data  from  some 
process  involving  five  variables,  which  he  assumes  to  be  related 
according  to  a  five  variate  normal  distribution.   Suppose  this  five 
varlate  normal  actually  is  defined  (completely)  by  the  following 
theoretical  vector,  U,  and  V-C  matrix,  Z-  : 

^    7.i;6oo'\ 
1+8,1500 

(i^.l)  U  -^ 


V- 


y<. 


fu^\ 


I  I .7700    >  =  i 
30,0000 
^  93.h200  i 


2 


i  ^x^ 


j^ 


k) 


3 
\ 


(i;.2) 


\ 


'3l;.6025  20.9233  -  31.0517  -     21;.  1667  6i;.6633 

20.9233  2i|2.lli.08  -13.8783  -253.1+167  191*0792 

-31.0517  -     13.8783  i;l.0258  3.1667  -     51.5192    >. 

-  2U.  1667  -253.^+167  3.1667  280. 1667  -  206. 8083 

61+,6633  191.0792  -  51.5192  -  206.8O83  226.3133  ^ 


Using   developments  of   chapter  I  I ,  we    let     Y 


I.     The  value  of  U  and   2-   used  here  were  computed  as  sample  vector  Z. 
and  V-C  matrix  S   using   data   from  table  20 .I4.,    page  61;7  of   HALD    [hJ. 
The   results  of    tables  20.5  and  20.6  of   Ha  Id  were  used   to  verify   the 
results  of  computer  program  MVSIM,  which  performed  most  of    the 
computations   required  for  this   paper. 


\k 


We  know  that  we  are  going  to  be  given  values  of  y.,  yp,  y-,,  yi  , 
from  which  we  wi I  I  predict  y_.  Hence,  we  must  set  up  the  prediction 
equation  for  yc.   (equation  2,6),  Accordingly,  we  partition  U  and  L, 


as: 


7.1+6001 
1+8. 1500 
1 1 .7700 
30.0000 

95.I4200 


31^.6025 

20.9233 

-  31.0517 

-     Pll.1667' 

61;.6633 " 

20.9233 

2i42.  11408 

-    13.8783 

-  253.1+167 

>  < 

191.0792 

-  3l.05t7 

-      13.8783 

1+1.0258 

3.1667 

-    51.5192 

.-  24.1667 

-  253.1+167 

3.1667 

280.1667'' 

^-  206.8083' 

u 


(  6U.6633       191.0792   -  51.5192   -  206.8083)    (    226.3133) 


ForJJ,  we  get 


fi 


L21  Z-22 


,-1 
21  ^11 


(Eo.  I 


f  1.5513]    c/J,' 

A 
A 


.5103 
.1021 
I-  .11+38 J 


W  ' 


The  prediction  equation  for  y_  becomes: 
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i=l  1=1 


95.1+200  -  (i.55«3    .5103    .1021  -  .ihdQ)** 


7.^600 1 
146.1500 
1 1.7700 
30.0000 • 


(V 


+  (1.5513     .5103     .1021   -  .lU38>< 


*\ 


'2 
y    * 

3 


or, 

(4.1+) 
E(yJV;*) 


62.3881  +  1.5513  y,*  +  .5103  y^*  +  .1021  y  *  -  .1U38  y,' 


The  variance  of  y(_|Y  *,  Ccc  1  2  ^  L.  '  ^^^®  conditional  variance 
of  y^  given  y,,  y^,  y^,  y^^,),  is: 


^5.1,2,3,1;  "  ^22  "  ^21  ^11  ^12 

=  226.3133  -  222.32I+2  =  3.9891 

which   is»a  measureof   the  prediction  error  when  formula  h»h  's   used 
to  predict  yc  when  Y.*   is  knpwn.     By  comparison,    if    the  values  of   Y.* 
were   ignored  and   if,    instead,    the  value  Uj-  =  E   (yc)    =  95.1|200  was 
always   used   to   estimate  y^.,    the  corresponding  measure  of   the  predic- 
tion error  would  be    OT     =  226.3I33» 

55 


j6 


Thus,  by  knowing  the  mean  vector  U,  and  the  V-C  matrix  L,  »   as 
given  by  formulas  U, I  and  h»2,   we  can  set  up  the  above  prediction 
equation,  h»h»     Then  for  any  set  of  values  y.,  yp,  y^,,   yi  ,  we  can 
make  an  accurate  prediction  of  yc  without  observing  its  values 

The  problem  facing  the  experimenter  is  more  complicated  than 
the  one  discussed  in  the  preceding  paragraphs.  This  is  because 
he  does  not  know  the  values  of  mean  vector,  U  and  the  V-C  matrix, 
1^,     All  he  knows  is  that  (by  assumption)  y  ,  y^,  y..,  yi,  y^.  are 
distributed  according  to  softie  five  variate  normal  distribution,  and, 
therefore,  are  completely  specified  by  some  theoretical  mean  vector 
U  and  V-C  matrix  2-  whose  actual  values  he  will  never  know. 

Assume  the  experimenter  draws  a  sample  of  size  500  from  this 
five  variate  normal  distribution  (specified  by  equations  i4.o  I  and 
h^^) •     He  then  computes  all  sample  means,  variances,  and  covart- 
ances  (7-  =2-*  s..,  s.  .  respectively)"  and  forms  the  sample  mean 
vector,  Z,  and  sample  V-C  matrix  S  as  defined  in  chapter  III* 
Suppose,  as  an  example,  he  obtains  the  following  results  upon  draw- 
ing a  sample  of  size  500s 


Z  - 


z.  "i 


'3 
(^5l 


r* 

[ 

< 

0 

^v 

M 

L 

r    7.7761+1 

U8.7I55 
11.5687 
29.3039 

(96.1+8 16  I 
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3U.6I60 

20.ii4.95 

-  32.8248 

I-  21.5656 


-    {-ul 

20. \h95 
217.9056 

18.314+3 
223.1718 


11 


'21 


12 


'22' 


-  32.82i;8 

-  l8.3l4i+3 
142.8371 

7.4936 


21.5656 
223.I7I8 
7.4936 
2142.3209  J 


X 


'      64.0139 
171.9229 

-  57.4059 

-  I8O.4153 


(  64.0139    171.9229  -57.4059  -  I8O.4153I  (  211.0392) 


To  estimate  U   by  B  we  compute 

(      I .6360  ' 
.5921 


=  hi  ^7lf -^ 


.1774 
I-  .0590 


Hence,  the  estimate  of  the  prediction  equation  3«4  becomest 

4       4 

t^pV^)  -  ^  -  Z  bj  r.  .  I  b.  y.* 

^   =  54.5921  +  1.6360  yj*  +  .5921  y^*  +  .1774  y^*  -  .0590  y^^' 

The  unbiased  estimate  of  the  conditional  variance  of  yq|Y| 
would  be: 


y5i^i 


^55-1,2,3,4  °L 


-I  1       ri-' 

^55     "  ^21      ^11  ^12  J  *   n.p-l 


499 
4.0368  X  u^  =  4.0694 


Now  the  experimenter  is   in  a  position   to  predict  the  value  of  y 


\e 


given  a  set  of  values  y|*,  yg*,  Vt*,  y|^*.  For  example,  suppose  he 
is  given  that  the  y.*  =  u.,  the  true  means  of  the  y|,  (Of  course, 
he  doesn't  know  that   these  are   true  means). 

Using   the  true  prediction  equation,  we  get   (see  h»3) ' 

h  h 

E  (y5lY,*  -  u)  =  u    -  X.  /?i"i  +  L  /^i"!  -  95.^i9i4- 


i  =  i 


The  experimenter  would  estimate  this  value  as; 


E  (yjY, 


5h.392\  +  ( 
+  ( 
+  ( 
+  ( 


.6360)  •  (  7.^600) 

.5921)  •  {UB.1300) 

.1771+)  •  (11.7700) 

.0590)  •  (30.0000) 


95.62i;3  . 
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Chapter  V 

REDUCTION  IN  THE  NUMBER  OF  VARIABLES  IN  REGRESSION  -  INTRODUCTION 

Experience  has  shown  that  when  the  number  of  variables,  p,  is 
large,  say  over  20,  usually  a  relatively  small  number  of  variables    ^ 
can  be  found  to  use  in  regression  to  predict  y  nearly  as  accurately 
as  when  all  p-l  variables  are  used  in  regression^   9,  page  20 « 
Finding  such  a  small  combination  of  variables  is  desirable  for  a 
number  of  reasons i 

I)   The  prediction  equation,  has  fewer  terms;  thus  it  is 


^ 


easier  to  compute  a  predicted  value  of  y  , 

2)  Fewer  variables  need  jto_^e  observed  in  order  to  make 
a  prediction  of  y  ,   Presumably  this  would  result  in 
reducing  the  cost  of  observing  variables  for  each  pre- 
diction of  y  , 

3)  When  p  is  large,  the  prediction  equation  involving  p-l 
variables  requires  many  computations.  Step-wise  pro- 
cedures, described  later,  when  yielding  a  relatively 
small  number  of  variables  in  regression  produce  a  pre- 
diction equation  with  much  less  effort, 

i|.)  When  the  regression  is  being  performed  on  a  sample, 
variables  that  do  not  contribute  much  variance  reduc- 
tion of  y  can  actually  cause  the  prediction  equation 
to  yield  a  worse  fit  to  the  underlying  (specified)  p 
variate  normal  than  would  result  if  they  were  omitted 
from  regression.  The  reason  is  that  the  longer  equation 


20 


^ 


\' 


can  overfit  the  sample  and  ascribe  some  of  the  variation 

due  to  small  scale  random  fluctuations  to  one  of  the  ore"" 

dictors  "by  accident". 

As  one  would  suspect,  whenever  a  single  variable,  y.  ,  is  added 

to  regression  the  old  conditional  variance,  say,  Cy-      ,  is 

^ppol,ooo,q 

always  greater  than  or  equal  to  the  new  conditional  variance, 

QZr.    I     ^  L.«  However,  usually  the  amount  by  which  C*"   , 
'^pp.l,...,q,K         »       '  f  ppol,oo.,q 

is  reduced  becomes  small  as  the  number  of  variables  in  regression 

increases,  even  though  optimal  combinations  for  each  number  of 

variables  in  regression  are  used.  To  illustrate  this  idea,  let  us 

consider  an  example  of  a  regression  problem  under  ideal  conditions. 

That  is,  we  shall  examine  a  p  variate  normal  specified  in  terms  of 

vector  U  and  matrix  Z- • 

We  first  compute  the  prediction  equation  for  y  ,  and  the 

associated  conditional  variance  Q-^    ,     „,  for  each  possible 

pp, I , ,• « ,q 

p-l 


(I  o 


combination  of  variables  Yi  »•••  tY^.    i  '"^  regression 

sets  of  prediction  equations  to  solve).  We  shall  then  group  the 

results  according  to  number  of  variables  in  regression,  and  from 

each  group  pick  the  "optimal"  combination  of  variables  in  regres- 

sionj  that  is,  the  combination  of  variables,  say,  y.,«,o,y  ,  in 

regression  producing  the  smallest  C7r„  i     «, 
^        r       a  pp,l,,,,,q" 

I,  Clarification  of  notation:  The  reader  should  understand  that 
whenever  a  "combination  of  variables  in  regression,  say,  y|,,,,,y  " 

and  the  associated  conditional  variance,  "/^   ,      ,  is  discussed 

'-'pp.l,,..,q' 

as    in   the  preceding   paragraph,   the  q  variables    in   regression  are  not 

necessarily  meant  to  be   regarded  as   the   first  q  variables  as   defined 

by   position   in   the  original    vector  U  and  matrix  ^,      In  other  words, 

in  order  to  ease  notational    difficulty,   variables    in   regression  are 

tempo rari  ly   relabeled  y|,,,,,y   ,  ^^ 
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With  this  grouping,  we  can  now  start  with  one  variable  in 
regression  and  add  to  the  number  of  variables  in  regression  one  at 
a  time,  each  time  choosing  the  "optimal"  combination  of  variables 
for  that  group,  until  we  decide  that  adding  more  variables  to  regres- 
sion will  not  reduce  0~      ,  enough  to  make  it  worthwhile, 

pp.l,...,q 

In  our  example,   we  shall    use  the   five  variate  normal    as   defined 
by  ij.,  I   and  i;*?.     To  compute   the  prediction  equation  using  only   y. 
in    regression,    the  prediction  equation  becomes? 


E   (y^lY,*)    -  U5  -  /),    u,    *  /J,    y, 


_-|    ^ 


re    /?,  -  P'[I.2,    I,,]      -  6i*.6635  •  rr^^    -  1 .8687 


wher_     ^.         ^     .,   ^.    ,    ..  .  w .,    -      ,     , 

3^.6025 

^"d    C^^.,    =  i;22  -  Z2I    ^11    ^12 


6U.6653  X  6U.6633 

226.3133 7— -   IO5.i4.739  . 

3U.6025 


Similarly,  we  compute  partial  regression  coefficients,  Zj. , 
for  all  15  possible  combinations  of  the  variables  y.,  y^,  y_,  yi 
in   regression.     Table    II    shows   the   results. 
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Table   II 


Variables 

in 
Regression 

q 

Parti 

a  1    Reg  ress 

ion  Coeffi 

cien 

Associated 
ts            Conditional 
Variance 

^            ^5-1, ...,q 

y\ 

1 

1.8687 

105.1+739 

"2 

"3 

1 
1 

.7891 

-  1.2557 

75.5280 
161.6167 

* 

''I* 

1 

- 

.7381 

73.6553  <^ 

»/,   Yg 

2 

I.ijj683 

.6622 

1+.826I® 

^1 

^ 

2 
2 

2.3125 
l.i;399 

.7313 

.4945 
-   1,0083 

•» 

.6139 

102,2551 

C^l 

6.2303 -^ 

"2 

^3 

34.6208 

"2 

^i. 

2 

.3108 

- 

.1+569 

72,1+065 

^3^'!. 

2 

-   1.1998 

- 

.72J+5 

ll+.64i+7 

y,    Yg 

^3 

3 

1 .6959 

.6569 

.2500 

1+0  0096 

(^"2 

^) 

3 

l.i+519 

.2+160    , 

- 

.2365 

3o9982^r' 

(5) 

''2 

3 

3 

I.0518 

-  .9231+ 

-  .1+100 

-  \cUh79 

^ 

.6I427 
1.5570 

l+c2368 
6.1506 

*y,  Vg  Yt,  yj[;      ^      i.55«3 


.5103 


.1021     -    .11+38 


3.9891 


Note:  Each  group  is  identified  by  the  value  of  q. 


*   Indicates  the  "optimal"  combination  of  variables  for  the  group 
(for  that  number  of  variables  in  regression). 
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Variables   in  Regression 
(Optimal    Combination) 

Associated 
Condi  tional 

Variances  of   y_ 

^ 

None 

^55 

=  226.3133 

"4 

C^3'h 

«    73.6553 

y,.  yg 

^5-1,2 

=      U.8261 

y,,  yg. 

^k 

^5-1,2,1; 

=      3.9982 

y,.  yg. 

^3' 

"U 

^5-1,2,3,1+ 

=      3.9891 

We  now  select  the  optimal  combination  from  each  group  of  variables 
in  regression  as  follows  (we  omit  the  partial  regression  coef  f  icients)  r 

Table    I  II 


q 

Group   Number 
0 
I 

2 
3 
h 


From  table  I  I  I  we  immediately  see  that  most  of  the  reduction  of 

the  conditional  variance  of  yc   can  be  done  by  introducing  only  two 

of  the  possible  four  variables  into  regression,  namely  y.  and  y2o 

Very  little  more  is  accomplished  by  using  the  other  two  variables, 

given  that  y  and  y  are  going  to  be  used  in  regression. 

Note  that  the  five  variate  normal  is  easily  handled  by  an  elec- 

h 

tronic  computer  because  only  ^    {.)    -    15  prediction  equations  had 

j«l 

to  be  computed,    none  with  more  than   five  variables    involved.     On 

the  other  hand,    the    18  variate  normal,    for  example,    requires 
17 

/      (      )    =    I3I»07I    prediction  equations,   most  of  which   involve  many 

variables.     Hence  this   procedure   is  not  always   feasible  even  when 
today's  high  speed  electronic  computers  are  available. 

It    is   interesting   to  note  that  when  all    four  variables  are   in 


2k 


regression  the  values  of  the  regression  coefficients  do  not  suggest 
which  variables  might  be  best  to  eliminate  from  regression,   Sn  factj, 
none  of  the  values  are  close  enough  to  zero  to  indicate  that  any 
should  be  removed. 

In  this  chapter  It  has  been  shown  that  we  can  expect  the  amount 
of  reduction  in  the  conditional  variance  of  y  to  be  less  per  varlabUe 
added  to  regression  when  the  number  of  variables  In  the  optimal  com- 
bination becomes  larger.  Thus,  If  one  were  willing  to  state  in  advance 
his  maximum  allowable  value  of  the  conditional  variance  of  y  ,  the 
problem  would  be  a  straight  forward  one  of  searching  table  II  for  the 
minimum  number  of  variables  producing  that  conditional  variance  or 
less.  We  now  restate  this  same  problem  In  the  above  terms? 

"To  find  some  satisfactorily  small  number  of  variables, 
q»  (q  :S  P-0#  that,  when  used  to  predict  y  ,  reduces,  Q-      j      - 
to  some  satisfactorily  small  fraction  of  the  unconditional  variance 
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Chapter  VI 
THE  STEP-WISE   PROCEDURE 

We  now  discuss  an  alternate  procedure  of   searching   for  optimal 
combinations  of   variables    in    regression,   called  the  step-wise  proce- 
dure.    This  procedure  has    the  advantage  of    reducing   the  number  of  \ 

p-l  -f\ 

prediction  equations  to  be  solved  from  y  i^Z  ) t   as  in  chapter  V, 

r=l 

to  p-l  or  less,  thus  keeping  the  number  of  computations  to  within 
the  capability  of  today's  high  speed  electronic  computers.  We  shall 
see  that  the  combination  of  variables  selected  by  this  method  is  not 
always  optimal,  i,e,,  it  is  possible  that  a  different  set  of  the  same 
number  of  variables  might  yield  a  more  accurate  prediction  equation 
for  y  ,  However,  practical  experience  indicates  that  sets  decidedly 
better  than  those  discovered  by  the  procedure  outlined  in  this  chapter 
are  rare   A  ,  page  I9«  We  shall  discuss  additional  problems  en- 
countered when  the  step-wise  procedure  is  applied  to  a  sample.  The 
need  for  statistical  tests  at  each  step  is  demonstrated  and  an  actual 
test  is  developed. 

The  step-wise  procedure  Is  as  followss  At  each  step  every  vari- 
able not  yet  in  regression  is  examined  to  see  how  much  the  conditional 

variance  of  y  would  be  decreased  if  it,  alone,  were  added  to  the 
''p 

variables  already  in  regression,  i.e.,  assuming  q  variables  are  already 

in  regression,  the  quantity  Q-  -  Q~  is  computed 

pp,l,,,,»q     pp,l,,.,,q,m 

for  each  variable,  y  ,  still  not  in  regression.  The  variable  to  be 
added  to  regression  is  y.  ,  the  variable  for  which  this  computation  is 
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greatest;    i.e.,    y.     is  chosen   from   the  variables   not    in   regression,   y   , 
so   that 

^p.l,...,q   "    ^p.l,,..,q,l<   ~  "^^    [^p,l,...,q  "    ^p.  I , .  ..,q,mj 
or  eqtiivalently,   such   that: 

t^pp.l,...,q,k  ^      L'^pp.l,...,q,mJ, 

We  illustrate  this  procedure  by  applying  it  to  the  p  variate 
normal  specified  by  equations  I|..  I  and  I|.,2.  This  illustration  can 
be  followed  most  easily  if  reference  is  made  to  table  II  of  chap- 
ter Vr 

Step  I:    Compute  all  four  conditional  variances  of  group  I, 
and  choose  the  smallest  value  (75»6553)» 
action:   add  variable  yL   to  regression 
results:   variables  in  regression:  yi 

%.!,  =73.6553 

Step  M:   Compute  the  conditional  variances  of  group  2  that 
include  variable  yr    in  regression,  and  choose  the 
smallest  value  (6.2303), 
action:   add  variable  y.  to  regression 
results:   variables  in  regression:   Yi*  yi. 

Step  III:  Compute  the  condi tiona I  variances  of  group  3  that 
include  variables  y  and  y,     in  regression,  and 
choose  the  smallest  value  (3»9982). 
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action:   add  variable  /p  to  regression 
results:  variables  in  regression:  y  ,  y  ,  yi 

Step  IV:   Add  the  last  variable 
results:   variables  in  regressions  y.,  yp,  y,,  yi 

^5- 1.2.3  A  =2-9^9' 

As    In  the  preceding  chapter,  we   immediately  see  that  most  of 
the  conditional   variances  of   y_  can  be  eliminated  by   using  only   two 
of   the  possible   four  variables    in   regression.     However,    this   time 
the  pair  chosen  were  variables  y.   and  yi     instead  of   y.    and  ypj    pro- 
ducing a   conditional    variance  of  6.2503   instead  of  i4,826l. 

The  step-wise  procedure   is  equally  applicable  to  analysis  of 

a  sample  of   size  n.      In   this  case  all    information   is  obtained   from 

the  sample  vector,   Z,   and  sample  V-C  matrix,    S.      In   particular, 

the  values  of    the  sample  conditional   variances,   s        ,  ,    rather 

pp • I , • s • ,  g 

than  O^      I       are  used  at  each  step  to  determine  the  next  varl- 
'^pp.l,.,.,q 

able  to  enter  regression.  As  before,  p-l  prediction  equations, 

and  associated  estimated  conditional  variances  of  y  ,  s   ,      , 

'^p'   pp.l,,,.,q» 

can  be  obtained.  Each  succeeding  equation  will  contain  one  more 

variable  in  regression,  and  usually  will  have  a  smaller  value  of 

I 

s    ,      ,    Now,  as  in  chapter  VI,  the  most  acceptable  combination 
pp* I , •• •,q 

of  '^^ariables  in  regression,  for  which  the  estimated  conditional  vari- 
ance of  y  is  small  enough,  can  be  chosen, 

I,  The  exception  can  occur  when  the  sample  size,  n,  is  small.  See 
Ha  Id's  example  table  20,6,  where  n  »  I3,  [h]. 
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At  this  point  we  must  consider  a  problem  that  is  ever  present 
whenever  a  sample  is  used  as  a  source  of  informationo   In  the  present 
case  the  problem  is  stated  as  follows;   How  do  we  know  that  the  sampSe 
size,  n,  was  large  enough,  so  that  the  conditional  variance  associated 
with  the  combination  we  have  just  selected  is  accurate?  (We  will 
always  assume  that  n  is  greater  than  p). 

Intuitively,  if  n  is  just  a  little  larger  than  p  we  should  not 
have  much  confidence  in  sample  vector  Z  and  sample  V-C  matrix  S, 
nor  in  the  estimated  regression  coefficients  or  conditional  variances 
of  y  ,   In  fact  we  shouldn't  be  surprised  if  a  second  sample  of  the 
same  size  were  to  produce  a  comnletely  different  set  of  variables  when 
the  same  step-wise  procedures  are  usedo  On  the  other  hand,  as  n 
approaches  infinity  the  samples  Z  and  S  approach  the  true  values  of  U 
and  2j  •   I  ^"  is  clear  that  at  each  step,  each  variable  that  is  a  candi- 
date to  enter  regression  should  be  given  a  statistical  test  of  some 
kind. 

Suppose  q  variables,  y|,...,yQ,  are  already  in  regression  with 

estimated  conditional  variance  of  y  given  by  s         i  and  suppose 

p        pp« I , • • • ,g 

that  we  are  considering  variable  y.  for  addition  to  regressiono   It 

can  be  shown  that  if  actually  (y-      ,  ,     =  Q-     ,  ,  then  the 

'    '*-'pp,l,...,cf:,k    "-^ppo  I  ,«..,q 

statistic 

(n-q-0  Spp.,,...,q  -  (n-q-2)  Spp^ ,  ^^^^^^^^ 


(6.1)        F= 


pp.  I,«..«,q,k 


has  the  F  distribution  with  I  and  n-q-l  degress  of  freedom  \h\t 
section  6.I4..  Furthermore,  statistic  F  wi  I  I  tend  to  be  greater  than 
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F  (I,  n-q-l)  if  C  ,       .Is  actually  less  than  C*"  . 

^pp,  l,.,,,q,K  '  ^pPo  I  ,o<,<,,q 

We  Immediately  encounter  a  new  complications  The  above  statistic 
F  behaves  as  stated  above  so  long  as  variable  y.  is  studied  by  itselfo 
However,  the  selection  of  y  from  among  those  variables  still  not  In 
regression  was  not  completely  at  random,  y  was  chosen  at  this  time 
because  it  was  estimated  to  be  the  "best"  variable  to  add  at  this 
step.   In  other  words,  we  are  In  effect  computing  F  for  a  number  of 
variables  and  choosing  the  variable  for  which  F  is  the  largest.   It 
Is  important  to  realize  that  due  to  this  method  of  selection,  the  F 
statistic  used  with  the  selected  variable  y(^  will  tend  to  be  larger 
than  would  be  expected  on  the  average  If  variable  y,  were  to  be  studied 
as  an  Individual  variable  alone.   Intuitively,  this  effect  should  be 
stronger  with  the  first  variables  added  to  regression,  since  those 
variables  for  which  F  Is  large  due  to  randomness,  are  removed  from 
those  not  In  regression  early.  Suggested  procedures  for  compensating 
for  this  are  discussed  In  a  later  chapter. 

Let  C(  be  the  probability  of  erroneously  concluding  that 

/^   ,      ,  »  is  less  than  0~     ,      whenever  actually  they  are 

^pp,l,,..,q,k*  pp,l,..,,q  ^  ' 

equal  (0(ls  usually  chosen  to  be  ,05)0  This  error  is  usually  called 

the  type  one  error.  Suppose  now,  at  each  step  we  compute  the  statis~ 

tic  F  of  formula  6,1,  and  compare  with  the  value  of  ^ry/ .  ,\ 

\J(^\  I ,  n—q—  I ) 

which  can  be  found  In  tables  of  the  F  distribution.   If  the  sample  size 

Is  too  small  the  power  of  the  test  will  be  low.  This  means  that  the 

actual  difference  between  rr      ,       and  (y-  ,  can  be  sub- 

^pp.l,,,.,q  ^pp,l,..,,q,k 

stantlal   and  still,    the  probability  that    the  computed  statistic,   F, . . i , 


III    exceed  F/^, .  .    can   be  small,      (Of   course,    this   probability 

(JC(l,n-q-l) 
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will  always  be  greater  than  QC  ) •     This  error  is  usually  called  the 
type  two  error. 

On  the  other  hand,  given  that  Q" ^    ,     „    is  actually  greater 

PP  e  I  ,  o  e  e  «  C| 

than  Qz^    ,  „   ■,  (the  only  alternative  being  that  they  are  equal) 

pp« !»•••, q,K 

regardless  of  how  small  the  actual  difference,  the  probability  that 

statistic  F  exceeds  F^/ .  ^^g^i^  can  be  made  as  close  to  one  as  we 

please  by  increasing  sample  size,  n,  indefinitely© 

Meanwhile,  among  those  variables  for  which  C7r^  i     n  's 

actually  equal  to  CTp^i  ^^^  „  \^,   approximately  0(  ^  '00  percent 

are  expected  to  "pass*  the  F  test  (i.e.,  F  »•  F^^,^      v)  inde- 

UL\  J ,n-q-i; 

pendent ly  of  the  sample  size,  n. 

We  have  just  seen  that  the  two  important  factors  that  affect  the 

probability  that  variable  y^^  will  pass  a  particular  F  test  are  the 

amount  by  which  the  actual  values  of  Q~^    i     «»  ai^d  CT,.  i     «  i, 
'  '-'pp.  I  ,,o,,q'     ^pp.  I  ,<.oo,q,K 

differ,  and  the  size  of  the  sample,  n.  Thus,  the  decision  rule  we 
mligl^t  use  is  to  terminate  the  step-wise  procedure  at  any  step  that 
all  variables  still  not  in  regression  fail  to  pass  the  F  test. 
With  this  decision  rule,  the  F  test  will  limit  the  variables  in  re- 
gression to  those  whose  contribution  to  reduction  in  conditional 
variance  of  y  appear  to  be  large  enough  for  the  given  sample  to 
measure. 

In  the  next  chapter  we  shall  consider  additional  halt  criteria 
which  an  experimenter  may  wish  to  impose  on  the  step-wise  process. 
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Chapter  VI  I 

AUTOMATIC  REGRESSION  ANALYSIS  -  CRITERIA 
FOR  HALTING  STEP-WISE  REGRESSION 


In  this  chapter  we  shall  develop  useful  procedures  for  con- 
ducting automatic  regression  analysis  on  a  sample  of  size  n  of  a  p 
variate  normal  using  a  high  speed  electronic  computer.  Efroymson 
3  has  developed  an  algorithm  very  suitable  for  computer  use  in 
which  any  single  variable  can  be  added  to,  or  eliminated  from  re- 
gression (depending  upon  its  former  status).  At  any  step  the  re- 
gression coefficients,  conditional  variance  of  y  ,  multiple  correla- 
tion coefficient  of  yp  on  the  variables  in  regression,  and  many 
other  desirable  parameters  can  be  computed  easily  and  printed  outo 
Useful  criteria  for  halting  the  regression  process  are  discussed 
and  developed. 

Given  a  sample  of  size  n  of  a  p  variate  normal,  formulas  for 
computing  vector  Z  and  matrix  S  have  already  been  described.  Also, 
basically  we  shall  use  the  step-wise  procedure  of  adding  variables 
to  regression.  The  most  important  remaining  problem  is  to  consider 
how  the  user  of  an  automatic  regression  analysis  computer  program 
can  specify  in  advance  of  the  computer  run,  reasonable  criteria  for 
halting  the  step-wise  procedure. 

So  far,  it  appears  that  a  satisfactory  criteria  for  stopping 
the  regression  process  has  never  been  fully  developed  to  suit  auto- 
matic step-wise  regression.  Miller  17  proposes  adding  variables 
until  the  F  test  fails.  He  also  proposes  a  method  of  adjusting  the 
level  for  which  the  critical  F  is  chosen  (l  -  CX  '"^  chapter  VI  I )  to 
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compensate  for  the  fact  that  the  method  of  choosing  each  variable 

to  enter  regression  is  not  a  random  choices 

In  order  to  derive  a  test  for  the  statistical  significance 
of  X  J,  the  following  analysis  may  be  performed!  When  a 
predictor  is  chosen  at  random  from  a  group  of  predictors, 
an  F  test  is  performed  where  the  critical  F  is  usually 
taken  at  the  95?^  level.  This  allows  for  a  one  in  twenty 
chance  for  considering  this  predictor  significant  when  in 
fact  it  is  not.   In  the  screening  procedure  the  selection 
of  Xj  is  not  a  random  choice.  Therefore,  it  is  necessary 
to  determine  at  what  probability  level  the  critical  F 
should  be  taken  while  still  specifying  a  one  in  twenty 
chance  occurrence. 

For  the  screening  procedure  it  appears  proper  to  make  the 
level  for  which  the  critical  F  is  chosen  a  function  of  the 
number  of  possib  le  predictors,  n.  The  ordinary  95^  level 
F  can  be  expressed  as 


^•95   '  "^(1  -  !/20)* 
and  for  the  screening  procedure  the  95^  level  is 


^.95  °  ^(1  -  1/20. n)' 

Intuitively,  Miller's  solution  seems  to  be  somewhat  extreme. 
For  example,  if  p  =  5'  (sncl  CX  *  »'^3)    ^hen  at  the  first  step  the 
level  chosen  for  the  critical  f ,   OC  »   would  be  computed  as  follows: 

I  -  2^  =  I  -  _i>--  =  ,998;  0C'=  »  -  «998  =  ,OOIj 
p       20x50 


so  that  the  value  used  for  comparison  would  be  F  qq.  (  I  ,i+9)  *  12,2, 
rather  than  F  ^^  (1,^49)  =  koOJ>   when  no  adjustment  is  made.   In  this 
case  the  critical  F  value  is  arbitrarily  tripled  only  because  there 
are  50  variables  still  not  in  regression.  Granted  that  the  critical 
F  should  be  adjusted  upward  in  order  to  maintain  a  "one  in  twenty 
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chance  occurrence".  It  would  seem  that  due  to  lack  of  information 
as  to  the  extent  of  this  non-random  effect,  one  should  make  such  an 
adjustment  more  conservatively  than  thiSc 

Perhaps  a  satisfactory  "hedge"  might  be  to  use  the  adjusted 
level : 


log  p  log  p 

where  K  is  a  constant  inserted  by  the  program  user  for  his  parti- 
cular samp le. 

Conceivably,  one  might  wish  to  make  no  adjustment  at  all  for 
this  effect  because  the  consequences  of  increasing  the  type  two 
error  during  the  early  steps  are  so  detrimental  to  the  step-wise 
procedure, 

Efroymson  3  proposes  two  F  tests  at  each  step.  His  program 
first  compares  each  variable,  y.,  currently  in  regression  with  an 
appropriate  "min  F"  critical  value  to  see  If  it  still  passes  the 
F  test  of  significance.   If  such  a  variable  Is  discovered,  the 
action  at  that  step  Is  to  remove  the  variable  from  regression.  By 
setting  min  F  to  a  value  slightly  less  than  the  standard  critical 
value  used  for  adding  variables,  the  possibility  of  creating  an 
endless  loop  is  avoided,  , 

This  feature  Is  appealing  because  new  combinations  in  regression 
obtained  In  this  manner  are  always  more  nearly  optimal  (as  far  as 
the  sample  is  concered)  than  was  the  preceding  combination  of  the 
same  size;  yet  the  number  of  computer  instructions  required  to  do 
this  operation  is  minimal. 
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In  chapter  VI  it  was  shown  that  the  choice  of  F^v/ n     nN  as 
the  value  of  critical  F  is  made  in  an  attempt  to  limit  the  variables 
in  regression  to  those  whose  contribution  to  reduction  In  conditionaD 
variance  of  y  is  large  enough  for  the  given  sample  to  measure©   it 
is  clear  that  Efroymsori's  double  F  test  contributes  to  this  effort 
by  insuring  that  all  variables  in  regression  continue  to  pass  the 
F  test  even  after  subsequent  variables  have  been  added. 

It  is  Impossible  to  anticipate  here  all  uses  for  different 
combinations  of  specified  stopping  criteria.  We  already  have  seen 
that  statisticians  so  far  have  only  provided  general  guide  lines 
in  this  area.  This  is  mainly  because  each  individual  p  variate 
normal  distribution  has  its  own  set  of  complications,  and  for  each 
computer  run  on  a  given  sample  the  experimenter  may  have  varying 
amounts  of  prior  information  regarding  the  p  variate  normal  he  Is 
studying.  Thus,  for  any  automatic  regression  analysis  computer 
program  it  is  important  that  the  user  of  the  program  be  able  to 
specify  halt  criteria  with  as  much  flexibility  as  possible. 

Perhaps  the  most  important  aspect  of  each  halt  criteria  is 
that  it  must  be  specifiable  in  a  manner  most  meaningful  to  the  ex- 
perimenter. For  example,  some  experimenters  under  certain  conditions 
may  not  look  upon  the  F  test  of  chapter  VI  as  being  useful  to  him  at 
all.  Quite  likely,  he  may  wish  to  replace  ^(yf^    n-q-l)  ^'^'^  ^  value, 
say  X*    ^o  he   the  critical  amount  of  reduction  of  the  variance  of  y. 
as  a  stopping  rule;  or,  he  may  want  to  specify  both  critical  values. 
Although  A  and  ^(yf^    n-q-l)  ^^^    '"  ^•^^®''®"^  units,  it  is  clear  that 
the  A  test  is  equivalent  to  an  F  test,  so  that  in  specifying  both 
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tests   the  experimenter  Is  merely   having   the  computer  apply  whichever 
test   is    the  most  stringent  at  each  step. 

The  following  example   illustrates  most  of   the  points   covered 
in   the    last   few  paragraphs.     We  show  here  how  a  suitable  choice  of 
min  F  and  critical    F,   artificially  chosen,   can  aid  Efroymson's 
double  F   test   procedure   to   find  a  more  nearly  optimal    combination 
of    variables    in   regression   than  already  obtained  by   the  step-wise 
procedure  at  a   previous  step.     To  do   this  we   take  an  example  worked 
out  by  Hald    l6j,   section  20,3,      In   this  example.    Ha  Id  used  data 
from  a  sample  of  size    13  of   a   five  variate  distribution  which  we 
will   assume  here   to  be  normal.     The  sample  vector,    Z,   and  sample 
V-C  matrix,   S,    are   the  same  as   those  shown  by  equations  J4.0 1    and 
h»2   in   this   paper,   which    in  chapter   IV  were  used   to  define  U  and 
2-,    respectively.      In   the  following   illustration  we  shall   consider 
I4.,  I    and  U»2  to  be  computed  Z  and  S  as    in  Hald's   example. 

From  Hald's  example  we  compute  the  F  statisticl6,  ij  for  each 
variable   in    regression  and  not   in   regression  at  each  step.     See 
table    IV  below.     For  variables    in    regression,   yj^,    the  value  com- 
puted   is   the  F  statistic   that  would  be  computed   for  y|^    if    it,   alone, 
were   removed  from   regression   first.     These  values   pertaining   to 
variables  currently   in   regression  are  underscored   in   table   IV,      In 
order  to   illustrate  the  above  points,    the  F   test  using  ^rv( 1   n^n^l) 
was  eliminated. 

We  now  choose  the  artifical  values  of  critical  F  and  min  F  to 
be  3.5  andi3,0  respectively.  With  this  choice  we  shall  obtain  the 
optimal    combination  of   variables   y.    and  y^,   where   the   regular 
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forward  step-wise  procedure  yielded    variables   y.    and  y^    in  chap= 
ter   IV, 

Table    IV 

Variables  .   . 

in  Regr,  Computed  F  Statistic  60  ij 

Before 


Step 

This  Step 

1 

0 

2 

yh 

3 

Vl*  vk 

k 

h'   ^2'   \ 

3 

y,,  y^ 

y-\ 

^2 

^ 

^ 

12.60 

21.96 

i+oii.0 

(  22.80) 

(108. 16) 

.17 

1+Oo30 

22.80 

108. 16 

(  5.03) 

h.2k 

!59o2! 

I5I+.O2 

5o05 

.01 

1 086 

the  optimal  combination  for 
two  variables  In  regression 


The  variables  added  to  or  eliminated  from  regression  were  chosen 
according  to  Efroymson's  double  F  test  procedure.  Recall  that  no 
variable  was  to  be  added  at  any  given  step  if  the  F  value  of  one  of 
the  variables  a  I  ready  In  regression  got  below  3.0  (min  F),  Hence 
at  step  ht   variable  yr  was  eliminated  yielding  the  optimal  comblna= 
tion  y.,  yp.  At  each  previous  step,  the  variable  added  (whose  value 
Is  enclosed  In  parentheses)  was  chosen  because  its  F  statistic  was 
the  largest  among  those  sti  II  not  In  regression  and  was  also  greater 
than  the  critical  F  value  which  was  artifically  chosen  to  be  3o5o 

This  example  illustrates  some  complexities  that  arise  during 
the  regression  process  that  are  still  not  completely  explainable 
analytically.  For  instance,  the  relative  values  of  statistic  F 
changed  drastically  as  the  combination  of  variables  In  regression 
changed.  These  values  correspond  to  relative  amounts  of  reduction 
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of  conditional  variance  that  would  be  due  to  the  corresponding 
variable  if  it  were  (or  is)  in  regression.  Thus,  when  yi  was  added 
to  regression  at  the  first  step  the  relative  contribution  in  variance 
reduction  due  to  y   jumped  from  1 3  to  108,  implying  that  y.  and  yi 
are  much  more  powerful  together  than  their  sum  when  each  is  used 
a  lone* 

This  example  also  suggests  reasons  why  an  experimenter  may  wish 
to  specify  critical  F  values  artificially,  especially  if  results  of 
prior  computer  runs  are  available. 

It  was  suggested  earlier  that  instead  of  keeping  track  of  com- 
puted values  of   [6,1  requiring  specification  of  artificial  critical 
F*s  on  the  part  of  the  experimenter,  it  might  be  simpler  for  him  to 
keep  track  of  actual  amounts  of  variance  reduction  of  y  and  make  up 
artifical  values  of  A  •"  units  of  variance  reduction  of  y  ,  Also 
it  is  clear  that  the  experimenter  may  wish  to  specify  a  value,  say, 
minA.—  A  t   which  would  become  the  critical  amount  of  variance  re- 
duction required  of  each  variable  in  regression,  in  order  to  stay  in 
regression. 

The  following  summary  lists  a  few  useful  halt  criteria  which  the 

experimenter  may  wish  to  specify  before  the  automatic  regression 

analysis  is  performed  on  a  given  sample.  The  automatic  regression 

program  should  permit  the  experimenter  to  specify  any  combination 

of  these  criteria  for  any  given  computer  run: 

'•  ^r\//,  i\  and  min  F  (Chapter  VI ) 

CX(l,n-q-l) 

2,   A  and  minA   (defined  above) 

3»  Stop  when  the  conditional  variance  of  y  gets  as  low  as 
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V  percent  of  the  original  variance,  Spp. 

h»     stop  when  the  conditional  variance  of  y  gets  as  low  as  T, 
5»  Stop  when  W  variables  have  been  added  to  regression. 
In  chapter  IX  we  shall  propose  a  procedure  using  some  of  the 
above  halt  criteria  in  searching  for  an  optimal  combination  of  vari- 
ables in  regression. 

In  chapter  V  it  was  stated  that  one  good  reason  for  reducing 
the  number  of  variables  in  regression  might  be  to  reduce  the  cost 
of  observing  the  variables  from  which  each  future  prediction  of  y_ 
is  to  be  computed.  Often  some  of  the  variables  cost  considerably 
more  to  observe  than  others,  and  the  experimenter  may  not  be  so 
interested  in  reducing  the  total  number  of  variables  to  observe, 
as  he  is  in  reducing  the  total  cost  to  observe  the  values  of  the 
variables  in  regression  for  each  prediction  of  yp  to  be  made  later. 
Thus,  it  is  desirable  that  the  experimenter  be  able  to  specify  ob- 
servation costs,  cj,  (say,  in  dollars)  for  each  "independent"  vari- 
able y|»»«»»yp_|»  snd  have  the  automatic  regression  analysis  operation 
reflect  these  costs  when  selecting  variables  to  go  into  regression. 

The  "cost  option"  should  differ  from  the  regular  option  only  in 
the  criteria  used  at  each  step  to  determine  which  variable  is  to  be 
added  to  regression.  Recall  that  the  regular  option  calls  forchosing 
the  variable  that  will  reduce  the  variance  of  y  the  most,  to  be  the 
variable  added. 

In  the  cost  option,  at  each  step,  those  variables  still  not  in 

regression  are  determined.  Then,  instead  of  y.  for  which  Q"       ,       . 

K  pp  ai,,««,g,K 

is  least,  (as  estimated  by  s„„  ,     _  l,),  y;  is  chosen  for  which 


59 


^j  /  ^^p.l,...,q-  ^p.l....,q,j)  '^  '®^^^^°  "»®«*  Vj  '2  chosen  on 
the  basis  that  it  is  cheapest  in  terms  of  "doHars"  to  observe  per 

unit  of  variance  reduction  of  y  ,  due  to  adding  y  „„   It  is  clear 

P  J 

that  the  standard  option  is  just  a,  specie!  case  of  the  cost  option 
in  which  the  observation  costs  are  all  specified  to  be  equa  I » 

Since  optima  I ity  is  now  measured  in  terms  of  minimum  cost  to 
observe  per  unit  of  variance  reduction  instead  of  maximum  variance 
reduction,  the  program  user  must  be  able  to  specify  a  halt  criterion 
so  that  whenever  the  cost  to  observe  a  variable  In  regression  be- 
comes greater  than,  say,  max  C,  the  program  will  remove  it,  and 
whenever  all  variables  still  not  in  regression  would  cost  more  than^ 
say,  C  dollars  per  unit  of  variance  reduction,  if  added,  the  program 
should  halt  regression.  Now,  minA  and  A  are  not  needed  as  halt- 
ing criteria  for  the  cost  options  However^,  the  experimenter  should 
still  have  the  option  of  including  other  halt  criteria  summarized 
above. 

To  summarize,  neither  Miller's  nor  Efroymson's  stopping  rules 
are  optimal.  Both  basically  use  only  the  statistical  F  test  of 
chapter  VI  as  a  decision  rule  for  halting.   It  has  been  illustrated 
here  that  additional  decision  criteria  that  can  be  specified  by  the 
experimenter  In  terms  more  meaningful  to  him,  may  greatly  facilitate 
his  search  for  optimal  combinations  of  variables  in  regression* 
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Chapter  VI  I  I 
THE  MV  REGRESSION  AND  MV  SIM  COMPUTER  PROGRAAAS 

The  purpose  of  this  chapter  is  to  describe  a  computer  program, 
called  MV  REGRESSION,  which  performs  automatic  regression  analysis 
on  a  sample  of  size  n.  Also  briefly  outlined  is  program  MV  SIM 
which  generates  samples  of  size  n  from  a  specified  p  variate  normal* 
The  detailed  operation  of  MV  SIM  is  described  in  appendix  A,  Both 
programs  are  written  in  NELIAC  compiler  languageo  Operation  of 
these  programs  on  the  Control  Data  Corporation  model  1602;  computer 
at  the  U,  S,  Naval  Postgraduate  School  has  produced  all  of  the  com- 
putations involved  in  the  examples  throughout  this  paper  as  well  as 
the  test  results  discussed  in  chapter  IX  and  appendix  B, 

Briefly,  MV  SIM  will  analyze  a  specified  p  variate  normal  (given 

by  U  and  2-,)   ^nd  print  out  true  regression  coefficients  and  associated 

/-^  for  any  set(s)  of  q  variables  specified  by  the  program 

*-^pp.  I,...,q 

user  (q  —   p-l).  Next,  MV  SIM  wi II  generate  a  sample  of  size  n  from 
the  specified  p  variate  normal  and  compute  sample  vector  Z  and  sample 
V-C  matrix,  S.  Before  turning  control  to  program  MV  REGRESSION,  MV 
SIM  performs  statistical  tests  on  Z  and  S,'and  prints  out  results  of 
these  tests,  but  takes  no  action  based  on  these  results.  These  sta- 
tistical tests  and  actual  computer  run  results  are  discussed  in  de- 
tai I  in  appendix  B, 

Before  proceeding  with  a  description  of  MV  REGRESSION,  it  is 
interesting  to  consider  the  powerful  research  tool  one  has  when  he 
can  specify  a  p  variate  normal  (U  and  L,  )   and  quickly  generate 
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random  samples   from  that  distribution.      It    is  obvious   that   this 
operation  saves  much   time   in  gathering   data,   or   in   "making   up" 
reasonable  samples  when   it   is   desired  to  test   the  operation  of  a 
regression  analysis   program  such  as  MV  REGRESSION.      (This  was   the 
case  when  computations    for  example    in   this   paper  were   required). 
But  AAV  SIAA  offers   the  statistician  a  much  more  useful    research 
capability   than   this.      Using  AAV  SIAA  one  can   make  accurate  compari- 
sons of    the   results  of   any   regression  scheme  with   true   regression 
equations,   conditional    variances,   etc.,  which  A^V  SIAA  computes   from 
the  specified  U  and  2-.  o     Of  course,    for  such   a  comparison,    the   re- 
gression scheme  must  be  applied   to  a  sample  drawn  by  AAV  SIM  from 
the   distribution  specified  by   U  and  L,  • 

The  sampling   capability  of   AAV  SIAA  also  makes    it   possibSe   to 
perform  empirical   sampling  studies  of    random  variables  whose  dis- 
tributions are  difficult   to   find  theoretically.      One  such  study, 
now   in  progress,    is   discussed   in  chapter  IX, 

We   finish   this  chapter  wl  th  a   detailed   description  of   program 
AAV  REGRESSION. 

The   inputs  of  AAV  REGRESSION  are  as   follows; 

1.  Start  with  a  sample  of  n  observations  of  the  p  variate 
normal.  If  MV  SIAA  supplies  the  sample,  it  will  supply 
it   in   the   form  of   Z  and  S. 

2.  Specify   "standard"  or  "cost"  option   (see  chapter  VI l). 
If   cost  option,   give  cost  of  observation,   Cc,    for 
variables   y.,    for  i    =   l,.,.pp-l.      If   the  user  specifies 
"standard",    he  still    may  specify  costs  and  obtain 
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printed  cost  data  even  though  the  "regular"  criteria  is 
used  as  far  as  entering  variables  into  regression  is 
concerned, 
3»  Specify  criteria  for  halting  regression  of  a  samples 

A,  l)  ^n/f,      \»    ^■he  value  to  be  compared  with  statistic 

F  for  adding  variables  to  regression. 
2)  Min  F,  a  value  less  than  ^ rs/, ,      \    to  be  compared 
with  statistic  F  for  removing  variables  from 
regression, 

B,  1)  Last  variable  added  reduced  the  conditional 

variance  of  y  by  less  than  X.   ("o^"  used  for 
cost  option), 
2)  Last  variable  added,  y.  ,  costs  more  than  C 

dollars  to  observe  per  unit  of  variance  reduction 
of  y  due  to  adding  y^.  (used  only  for  cost  option), 

C,  Conditional  variance  of  y  became  less  than  T, 

D,  Number  of  variables  in  regression  reached  W, 

Before  step  I  of  the  regression  operation,  MV  REGRESSION  prints 
out  s  ,  and  (optionally)  the  RR  matrix,  (The  RR  matrix  is  a  pxp 
matrix  which  contains  all  current  data  in  compact  form  from  which 
all  required  parameters  at  each  step  can  be  computed.  Initially, 
it  is  a  matrix  of  sample  correlation  coefficients  which  is  easily 
computed  from  sample  V-C  matrix  S,  See  Efroymson  B  )„ 

At  each  step,  after  a  variable  has  been  added  to  regression, 
the  following  data  is  printeds 

I     a,  "Best"  variable  to  have  been  added  (variable  with 
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minimum  Cl_,  ■     _)« 

b,  "Cheapest"  variable  to  have  been  addedo 

c.  Whichever  of  the  two  variables  above  that  actual !y 
was  added  (ao  if  regu iar  option,  bo  if  cost  option) 

li    The  value  used  in  the  F  test  for  the  added  (or  removed) 
variable.  MV  REGRESSION  compared  this  value  with  the 
input  value  of  ^(\((\    p)  (or   Min  F)o 

III  a.  The  square  of  the  estimated  new  multiple  correlation 

coefficient  of  y  on  the  variables  in  regressiono 
b.  The  estimate  of  the  new  conditional  variance  of 

Vp*  ^pp.l,...,q« 

IV  The  cost  to  observe  the  variable  just  added,  y^,  per  unit 

of  conditional  variance  reduction  due  to  the  addition  of 
this  variable  to  regression  at  this  time»  This  is  com- 
puted as  Ck  /  (s  pp^|,_,,q  -  s  pp„  | ,  ._,q,k)  • 

V  a,  A  list  of  the  new  set  of  variables,  y|,  (i  =  l»<,oo,q) 

in  regression, 
b.  The  estimated  regression  coefficients,  bj, 

VI  The  cost  to  observe  the  new  set  of  variables  in  regression 
per  unit  of  total  variance  reduction  of  y  »  This  is  com- 
puted ass 


ZL  *^i   /  ^^  pp  "  ^  ppol,.»o,q' 
1  =  1 


/  ^   nn  "    nn«  I -.a' 


VII   The  new  RR  matrix  (optional) 
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As  indicated  earlier,  it  is  possible  to  specify  cost  of  obser- 
vations, C|,  even  though  the  standard  option  is  usedo   In  this  case^, 
items  IV  and  VI  are  still  computed  and  printed,  but  of  course,  the 
"best"  variable  to  add  (item  la)  is  still  the  one  actually  addedo 

Each  step,  at  which  a  variable  is  being  removed  from  regression, 
item  I  above  becomes  "the  variable  just  removed",  and  items  II,  III, 
V,  VI,  and  VII  only  are  printedo 

Minor  changes  to  the  program  can  be  made  to  cause  it  to  print 
out  other  data  after  each  step,  such  as  estimated  variances  of  the 
estimated  regression  coef f icientSo 

The  next  few  pages  show  the  actual  program  output  of  a  regres- 
sion analysis  performed  by  MV  REGRESSION  on  a  sample  of  size  3^0  of 
a  five  vari  ate  normal.  This  sample  was  generated  by  the  MV  SIM  pro- 
gram using  input  vector  U  and  V-C  matrix  2^  given  by  I4.0  I  and  I|-«2o 
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MULTIVARIATE  ANALYSIS  (CONTINUED)   2   19   1963      PAGE 

COMPUTER  RUN  DATA 
NUMBER  OF  SAMPLES  =  3 

CRITERIA  FOR  CHOOSING  WHICH  VARIABLE  TO  ADD  TO 
REGRESSION  (AMONG  THOSE  PASSING  F  TEST) 

MAXIMUM  REDUCTION  OF  THE  CONDITIONAL  VARIANCE  OF  Y  5 

HOWEVER, 

THE  FOLLOWING  COSTS  OF  OBSERVATION  ARE  SPECIFIED 

Y1  Y2  Y3  YU  Y5 

10.0000     12.0000     16.0000     20.0000       .0000 


ANY  ONE  OF  THE  FOLLOWING  CONDITIONS  CAN  HALT  REGRESSION  STEPS 

1)  NUMBER  OF  VARIABLES  IN  REGRESSION  REACHED  k 

2)  CONDITIONAL  VARIANCE  OF  Y  5  BECAME  LESS  THAN  U.O 

3)  LAST  VARIABLE  ADDED  REDUCED  THE  CONDITIONAL  VARIANCE 

OF  Y  5  BY  LESS  THAN   2.0 
k)    LAST  VARIABLE  ADDED  COSTS  MORE  THAN  10.00  DOLLARS 

TO  OBSERVE  PER  UNIT  OF  VARIANCE  REDUCTION  OF  Y  5 
5)  NO  MORE  VARIABLES  (AMONG  THOSE  NOT  IN  REGRESSION) 

PASS  THE  F  TEST  OF  SIGNIFICANCE 
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MULTIVARIATE  ANALYSIS  (CONTINUED)   2   19   1963      PAGE   8 

SAMPLE  NUMBER   1 

SAMPLE  OF  SIZE   3C0  OF  THE  5  VARIATE  NORMAL 

SAMPLE  MEANS 

Yl  Y2  Y3  YU  Y5 

7.U125     U8.2586     11.9077     29.81U0  95.UU58 

SAMPLE  VARIANCE  CCVARIANCE  MATRIX 

Yl  Y2  Y3  YU  Y5 

31.5587     1U.6327  -   28.5108  -   17.8985  55.3288 

1U.6327    227.5360  -    3.915U  -  2U3.UUU9  171.7598 

-  28.5108  -    3.915U     39.1032  -    7.6U0U  -  39.833U 

-  17.8985  -  2U3.UUU9  -    7.6U0U    276.5990  -  191.U721 
55.3288    171.7598  -   39.833U  -  191.U721  199.0U21 


MULTIVARIATE  ANALYSIS  (CONTINUED)   2   19   1963      PAGE  10 
ANALYSIS  OF  SAMPLE  NUMBER   1 

SAMPLE  VARIANCE  OF  Y  5  =   199.0U21 

F  LEVEL  TO  ENTER  =  3.87      F  LEVEL  TO  REMOVE  =  3.7 

RR  MATRIX  TO  START 


Yl 

1.0000 

Y2 
.1726 

- 

Y3 
.8116  - 

YU 
.1915 

Y5 
.6981 

.1726 

l.COOO 

- 

.0U15  - 

.9703 

.8070 

.8116  - 

.GUI  5 

1.0000  - 

.073U  - 

.U515 

.1915  - 

.9703 

- 

.073U 

l.OCCO  - 

.8160 

.6981 

.8070 

— 

.U515  - 

.8160 

1.0000 
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MULTIVARIATE  ANALYSIS  (CONTINUED)   2   19   1963      PAGE  11 

STEP    1 

BEST  VARI.ABLE  TO  ADD  WAS  Y  h 

CHEAPEST  VARIABLE  TC  ADO  WAS  Y   2 

VARIABLE  ADDED  WAS  Y   U 

STATISTIC  USED  TO  COMPARE  WITH  F(l,   298)  =   595.9689 

NEW  MULTIPLE  CORR  COEFF  SQUARED  =      .33U0 
NEW  CONDITIONAL  VARIANCE  =    66.7210 

COST  TO  OBSERVE  Y   U  IN  DOLLARS  PER  UNIT  VARIANCE  REDUCTION  =      .1511 

NEW  SET  OF  VARIABLES  IN  REGRESSION 

k 
COEFFICIENTS      Ed)      BO  =   116.08U2 

.6922       .COOC       .0000       .0000       .0000 

COST  TO  OBSERVE  THIS  SET  OF  VARIABLES  PER  UNIT 
OF  VARIANCE  REDUCTION  OF  Y  5 

20.0000  DOLLARS  DIVIDED  BY 
132.3210  UNITS  CF  VARIANCE  REDUCTION  =      .1511 

THE  NEW  RR  MATRIX 


Yl 
.9632  - 

Y2 
.0132  - 

Y3 
.8256 

YU 
.1915 

Y5 
.5U17 

.0132 

.0582  - 

.1  128 

.9703 

.0152 

.8256  - 

.1128 

.99U6 

.073U  - 

.511U 

.1915  - 

.9703  - 

.07311 

l.OCOO  - 

.8160 

.5U17 

.0152  - 

.511U 

.8160 

.3340 

48 


MULTIVARIATE  ANALYSIS  (CONTINUED)   2   19   1963      PAGE  12 

STEP    2 

BEST  VARIABLE  TO  ADD  WAS  Y   1 

CHEAPEST  VARIABLE  TO  ADD  WAS  Y   I 

VARIABLE  ADDED  WAS  Y   1 

STATISTIC  USED  TO  COMPARE  WITH  F(l,   297)  =  3089. 66U0 

NEW  MULTIPLE  CORR  COEFF  SQUARED  =      .0293 
NEW  CONDITIONAL  VARIANCE  =     5.8889 

COST  TO  OBSERVE  Y   1  IN  DOLLARS  PER  UNIT  VARIANCE  REDUCTION  =      .16M3 

NEW  SET  OF  VARIABLES  IN  REGRESSION 

1  U 

COEFFICIENTS      Ed)      B0=   1C2.8895 

1.U12U  -     .6008       .0000       .0000       .0000 

COST  TO  OBSERVE  THIS  SET  OF  VARIABLES  PER  UNIT 
OF  VARIANCE  REDUCTION  OF  Y  5 

30.0000  DOLLARS  DIVIDED  BY 
193.1531  UNITS  CF  VARIANCE  REDUCTION  =      .1553 

THE  NEW  RR  MATRIX 


Yl 

1.0380  - 

Y2 
.0137  - 

Y3 

.8571 

YU 

.1988 

Y5 
.562U 

.0137 

.0581  - 

.12«4l 

.9730 

.0226 

.8571  - 

.12U1 

.2868 

.2376  - 

.0U70 

.1988  - 

.9730  - 

.2376 

1.0380  - 

.7082 

.562U 

.C226  - 

.0U70 

.7082 

.0293 
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MULTIVARIATE  ANALYSIS  (CONTINUED)   2   19   1963      PAGE  13 

STEP    3 

BEST  VARIABLE  TO  ADD  WAS  Y   2 

CHEAPEST  VARIABLE  TO  ADD  WAS  Y   2 

VARIABLE  ADDED  WAS  Y   2 

STATISTIC  USED  TO  COMPARE  WITH  F(l,   296)  =   127.14672 

NEW  MULTIPLE  CORR  COEFF  SQUARED  =      .0205 
NEW  CONDITIONAL  VARIANCE  =     U.13UU 

COST  TO  OBSERVE  Y   2  IN  DOLLARS  PER  UNIT  VARIANCE  REDUCTION  =     6.839»* 

NEW  SET  OF  VARIABLES  IN  REGRESSION 

1  2  i» 

COEFFICIENTS      B(I)      B0=    75.6177 

1.U258       .26»»3  -     .2792       .0000       .0000 

COST  TO  OBSERVE  TUS  SET  OF  VARIABLES  PER  UNIT 
OF  VARIANCE  REDUCTION  OF  Y  5 

U2.0000  DOLLARS  DIVIDED  BY 
19U.9076  UNITS  CF  VARIANCE  REDUCTION  =      .215U 

THE  NEW  RR  MATRIX 


Yl 
1.0413 

Y2 

.2360  - 

Y3 

.886U 

YU 
.U285 

Y5 
.5677 

.2360 

17.1986  - 

2.13U9 

16.73U7 

.3895 

•  886U 

2.13U9 

.0218 

2.3150 

.0012 

.4285 

16.73U7  - 

2.3150 

17.321U  - 

.3292 

.5677  - 

.3895 

.0012 

.3292 

.0205 

3)  LAST  VARIABLE  ADDED  REDUCED  THE  CONDITIONAL  VARIANCE 
CF  Y  5  BY  LESS  THAN   2.0 
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Chapter  IX 
CURRENT  STUDIES  AND  PROPOSALS  FOR  FUTURE  RESEARCH 

In  this  chapter  we  discuss  tests  that  have  been  started  using 
programs  MV  SIM  and  MV  REGRESSION.  Also,  plans  for  future  research 
are  proposed. 

Some  tests  (described  in  Appendix  B)  of  a  large  number  of 
samples  generated  by  MV  SIM  have  been  compieted« 

An  empirical  sampling  study  to  study  the  random  variables  in- 
volved in  the  F  test  of  chapter  VI  has  been  started  since  the  form 
of  the  distribution  is  unknown  and  extremely  difficult  to  obtain 
in  closed  fonn.  Actually,  p-l  random  variables,  which  we  will  call 
G.,.,,,G   .,  are  under  study  at  the  same  timeo  They  are  defined  by 
a  specified  p  variate  normal,  the  size  of  each  sample  of  the  p 
variate  normal,  n,  and  the  method  of  computing  values  of  G., 
i  =  !,•••, p-l,  from  a  sample  which  Is  described  next* 

At  step  one,  G.  is  defined  as  the  maximum  value  of   6ol  where 
F  is  computed  for  each  of  the  p-I  variables  (none  of  which  are  In 
regression  yet),  G2  is  dependent  upon  G|  in  the  sense  that  G2  is 
the  value  of  max  F  6,1  computed  after  the  variable  for  which  F 
equals  G.  has  been  entered  into  regression.  Thus,  at  step  two,  max  F 
is  the  maximum  value  of  F  for  those  p-2  variables  still  not  In  regres= 
sion.  The  step-wise  procedure  continues  without  the  use  of  any  tests 
for  halting  so  that  a  new  variable  Is  added  at  each  step.  Thus,  at 
step  i,  Gj  equals  max  F,  where  F  Is  computed  for  each  variable  stilt 
not  In  regression  by  step  I.  After  G.  is  recorded,  the  variable  for 
which  F  «  G.  is  entered  into  regression, 
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Since  the  values  of   16, ij depend  upon  the  sample  size,  we  see 
that  each  sample  of  size  n  of  a  speclfoed  p  varlate  normal!  produces 
one  value  of  each  of  the  random  variables  G|,,ooo»G  ,„     AJsOj,  to 
obtain  repeated  sets  of  values  of  the  same  random  variables,  the 
sample  size  must  be  kept  constant. 

The  tests  that  have  been  completed  were  performed  on  the  five 
variate  normal  specified  by  formulas  Ud    and  i4.o2o  Six  sample  sSzess 
50,  100,  150,  200,  250,  and  3OO  have  been  computed  50  times  each* 
The  results  of  G.,  Gp,  G,,  Gi  for  the  samp  He  size  100  are  plotted 
below  in  the  form  of  estimated  cumulative  distribution  functions 
(ccd.f.'s).  Where  feasible,  the  graphs  also  show  the  curve  of  the 
c.d.fo  of  F/,     ,No   (Recall  that  if  the  F  test  of  chapter  VI  had 
been  applied,  each  value  of  G^.  would  have  been  compared  with 
Fq^/,  n-q-l)  ^^  ^^®P  ^^''  ^^^   <H  =  0»  1,  2,  3). 
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So  far  the  minA  parameter  test  has  not  been  implemented  In 
program  MV  REGRESSION  so  the  type  of  artlfica!  control!  of  the  step- 
wise process  described  in  chapter  VI  I  has  not  been  testedo  However^, 
a  number  of  samples  of  size  300  of  an  18  x  18  matrix  (the  same  matrix 
used  in  Appendix  B)  have  been  processed  by  MV  REGRESSION,  using 
rather  wide  limits  on  the  halt  criteria^  After  examination  of  the 
first  run  it  was  obvious  that  three  variables  In  regression  were 
too  many  and  that  either  one  or  two  would  be  the  right  numbero 
Since  the  sample  size  was  large,  most  samples  allowed  nine  or  more 
variables  to  enter  regression  on  the  basis  of  passing  the  F  test 
even  though  nearly  all  of  the  variables  beyond  two  reduced  the 
estimated  conditional  variance  of  yng  by  less  than  1 oO  unlto  By 
comparison,  the  first  variable  usually  reduced  s.q  from  about  18o6 
to  about  6,5»  An  examination  of  the  computed  statistics  of  all  II 
variables  (whether  in  regression  or  not)  made  It  apparent  that  some 
test  such  as  the  minA  test  might  be  quite  useful  here© 

Advantage  was  taken  of  the  fact  that  the  true  p  variate  normal 
was  known  when  samples  obtained  from  it  were  being  analyzed  by 
MV  REGRESSION,  For  example,  after  the  first  run  on  several  samples 
of  the  18  variate  normal,  only  six  of  the  17  possible  predictors 
ever  got  into  regression  by  the  third  step.  Hence,  all  possible 
pairs  of  these  six  variables  were  fed  back  to  MV  SIM  for  which  the 
true  conditional  variances  of  yjp  were  computedo 

The  various  halt  criteria  suggested  in  chapter  VI  1  can  be  usef  u  D 
in  developing  methods  of  searching  for  optimal  combinations  of  vara<= 
ables  in  regression.   It  is  proposed  that  procedures,  such  as  the 
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one  described  below,  be  tested  and  compared  with  procedures  already 
described  to  see  if  better  results  can  be  obtainedo 

We  wi  II  assume  that  an  experimenter  has  a  large  sample  that 
perhaps  was  very  expensive  to  obtain.  We  shall  permit  the  experl  = 
menter  two  computer  runs  on  the  sample  samples   the  first  run  pro- 
viding a  set  of  feed-back  data  for  the  second  run. 

The  main  purpose  of  the  first  computer  run  is  to  determine  a 
lower  bound  on  the  conditional  variance  of  y-.  This  is  accomplished 
by  using  the  F  (and  min  F)  test  with  the  step-wise  procedure  with  (X 
set  to  permit  most  variables  to  enter  regression.  Of  course,  at  each 
step  valuable  information  such  as  the  conditional  variance  of  yp,  and 
the  amounts  of  variance  reduction  due  to  each  variable  shouBd  be 
printed* 

From  the  first  run  the  experimenter  chooses  the  maximum  number 
of  variables,  say  m,  that  he  will  have  in  his  final  prediction  equa- 
tion. This  is  usually  easy  to  do  by  examining  the  decreasing  values 

°^  ^nn  i»  ^nn  I  P    »  ^nn  I     n»   *'^®*'®  ^*  ^    -   ^>    represents  the 
number  of  variables  in  regression  after  the  first  computer  run. 

The  purpose  of  the  second  computer  run  is  to  make  a  rather 

thorough  (but  not  exhaustive)  search  for  the  optimal  combination  of 

m   variables  in  regression.  The  procedure  is  to  conduct  p<=l  separate 

regressions,  each  regression  starting  with  a  different  first  variable, 

and  continuing  until  m  variables  are  in  regression.  At  each  step 

(after  the  first),  the  variable  chosen  to  enter  regression  will  be 

the  variable  that  can  contribute  most  reduction  in  the  conditional 

variance  of  y  ,  unless,  by  adding  this  variable,  a  combination  that 
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had  been   In    regression   previously   (during  a   previous    regression) 
would   result.     For  example,    if    the  first    regression  added  variables 
in  order  /j,   y^p   yo*    then    if   the  second   regression   proceeded  as 
Vpo   yc»   variable  y.   would  not  be  permitted  to   enter  regression  next. 
Instead,    the  second  best  variable  would  be  chosen  at   this  step© 

Thus,   after  the  second  computer  run    is   completed   the  experi- 
menter will    have   (p-l)xm  prediction  equations   (and  conditional 
variances  of   y  )    to  choose   from,   p-I    for  each  number  of  variables 
in    regression. 

Two   further  investigations  are  proposed.      In  Appendix  Bj,    the 
results  of   tests  of   a   number  of   samples  of  a   five  and  an    18  variate 
normal    are  described.     As  a   result  of    the   fal  dure  of    the  sample  V-C 
matrices,   S,   of    the    18  variate  normal    to  pass   the  chi-square   testp 
it   Is   proposed  that   further  testing  of   the  multivariate  normal 
generator  be  conducted.     As    indicated   in  Appendix  B   the  possibility 
of    round  off   error  should  be  considered. 

It    is  also  suggested   that  a  study  be  made  to  ascertain  which 
of    the   two  suggested   tests  of    the  matrix  S   is  better.      Possibly  a 
study  would   indicate  weakness    in  both,     Anderson      iL   section    10,89 
describes  a   third   test  of   matrix  S, 

The  step-wise  procedure  of    regression  analysis  as   described   In 
this   paper  Is  called   the  "forward"  method  because   it  starts  with  no 
variables    in   regression  and  adds   them  to   regression  one  at  a   time. 
This    is   because   the   forward  procedure  permits  computational!    short<= 
cuts  so   that   the  number  of  computations  can  be  minimized   (especiaHy 
so  when  Efroymson*s  computer  program  algorithm   is  used     3   )o     The 
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backward  operation  of  removing  extraneous  variables,  however,  offers 
no  computational  advantageso  See  Quenoui i le  80  Another  reason 
the  forward  procedure  can  be  done  with  fewer  computations  is  because, 
usually  the  number  of  variables  in  the  final  regression  is  much  less 
than  p-l.  Often  the  reason  for  a  large  number  of  independent  var3=. 
ables  to  be  examined  compared  to  the  number  finally  used,  as  that 
from  those  variables  actually  measured  additional  variables  are  often 
created  to  account  for  possible  curvS  1  ineari  ty  and  interactiono  For 
example,  if  X.  is  a  variable  whose  value  was  actually  measured, 
variables  Y  =  Xj  ,  Z  =  X.^  may  be  computed  and  used  as  part  of  the 
original  p-l  possible  predictors,   9 L  see  page  20 » 

One  possible  advantage  in  using  the  backward  method  is  to  start 
the  process  by  computing  an  estimate  of  the  lowest  possible  value  of 
the  conditional  variance  of  y  ,  s   ,     ^  .0   if  somehow  this  value 

'P'    ppo  1,00  o,P'-ll 

could  be  obtained  before  the  forward  procedure  was  performed,  one 
could  estimate  the  amount  of  reduction  available  in  the  combined  com=. 
bination  of  variables  still  not  in  regression  at  each  step.  Knowledge 
of  this  value  at  each  step  should  be  useful  in  deciding  which  way  would 
be  best  to  go  nexts   i.eo,  eliminate  the  weakest  variables  now  in  rec= 
gression,  or  add  the  strongest  variable  still  not  in  regression,  or 
to  halt. 
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Appendix  A 

GENERATION  OF  THE  P  VARIATE  NORAAAL 
BY  PROGRAM  MV  SIM 


For  the  construction  of  each  sample  (of  size  one)  from  the 
specified  p  variate  normal,  MV  SIM  uses  an  independent  sample  of 
size  p  from  the  normal  (0,l)  distribution,   (cogo,  mean  u  -  0^ 
variance  0~  =    l). 

To  obtain  each  Independent  norma!  random  sample  (of  size  one)^ 
MV  SIM  computes  a  function  of  an  independent  sample  of  size  12  from 
the  uniform  (0,l)  distributlono   (e,go,  uniform  on  the  interval  zero 
to  one).   That  this  function  only  approximates  normally  distributed 
random  numbers  will  be  shown  below. 

It  follows  from  the  above  that  to  generate  a  sample  of  size  n 
of  a  p  variate  normal,  nxpxl2  random  numbers  from  the  uniform  (0^1) 
random  number  generator  are  required. 

A  discussion  of  several  techniques  for  generating  unifomily 
distributed  "pseudo"  random  numbers  Is  given  by  Barron  2L 
Empirical  test  procedures  are  also  given. 

The  particular  uniform  (0,1)  pseudo  random  number  generator 
used  by  MV  SIM  is  a  subroutine  called  RAND.  RAND  was  programmed 
according  to  specifications  given  by  Green,  Bert  F,  Jr,,  Smith,  J,  Eos, 
and  Klem,  Laura  5 U  The  number  of  InltSa!  random  numbers,  n  In  the 
reference,  used  by  RAND  is  seven.  This  article  also  discusses  a 
number  of  empirical  tests  that  have  been  applied  to  this  method. 

The  method  by  which  MV  SIM  uses  12  independent  uniform  (0,1) 
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random  numbers  to  compute  each  pseudo  normal  (0,l)  random  number 
is  discussed  by  Vaa  I  lOJ  ,  see  page  hOo      Briefly,  each  normal  random 
number  is  computed  as: 


where  the  Wj  are  the  required  independent  sample  of  size  12  from 
the  uniform  (0,l)  distribution.  The  variance  of  the  uniform  (0,i) 
distribution  is  one-twelfth  and  variances  of  independent^,  uniformly 
distributed  random  variables  are  additive  under  convolutiono  Hence 
it  is  convenient  to  select  12  as  the  number  of  uniform  random  vari^ 
ables  whose  sum  will  approximate  a  normal  variableo  Ateans  of  (onde° 
pendent)  uniform  variables  are  also  additive  so  that  it  remains  to 
subtract  the  constant  six  from  the  sums  of  12  Independent  uniform 
(0,l)  random  variables  to  approximate  the  normal  (0,1)  distribution. 
Vaa  has  a  discussion  of  the  advantages  and  disadvantages  of  this 
"truncated"  approximation  to  the  normal  distributiono 

Wold  III  «  pages  xi  to  xl i i ,  describes  the  method  which  MV  SIM 
uses  to  convert  an  independent  sample  of  size  p  from  the  norma  II 
(0,l)  distribution  to  a  sample  from  a  p  varlate  normal  specified  by 
U  and  z_.  •  This  method  requires  the  computation  of  a  pxp  triangular 
P  matrix,  P  =  |p.  .],  from  the  original  V=C  matrix,  2_,  ,  so  that  the 
following  matrix  equation  holdss 

For  our  discussion  we  arbitrarily  choose  the  triangulation  of 
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P  =  |pj  j]  so  that  p|  j  =  0  when  j  =-  i,  ioe,9  let  al!  "upper  diagonaD" 

elements  of  P  equal  zero.  Next,  assuming  x.sooosX  is  an  independent 

I      p 

sample  of  size  p  from  normal  (0,0*  t^he  sample  of  size  one  of  the  p 
variate  normal  is  computed  as? 


y,  =  u,  +  p,,  X, 

72  =  U2  +  P2I  X,  +  P22  Xg 


Yd  -  %  +  PdI  ^I  -^ 


'PP  '^P^ 


where  the  Uj  are  the  elements  of  mean  vector  Uo 

The  term  "pseudo"  random  number  is  customarily  given  to  numbers 
generated  by  arithmetic  means,  see  Barron  2L  pages  5»  ^s   o^  which 
the  RAND  subroutine  is  one. 

It  is  now  clear  that  the  samples  of  size  n  of  the  p  variate 
normal  generated  by  MV  SIM,  are  themselves  pseudo  random  numbers, 
since  they  are  merely  arithmetic  functions  of  uniform  pseudo  random 
numberso  Perhaps  in  this  context,  the  operation  of  this  part  of 
A/IV  SIM  might  have  been  called  "simulation"  of  a  p  variate  normal, 
rather  than  "generation".  To  carry  this  process  one  step  further, 
sample  mean  vector  Z,  and  V-C  matrix  S,  being  arithmetic  functions 
of  a  sample  of  size  n,  are  likewise  pseudo  random  matrices.  As  in 
the  case  of  the  pseudo  uniform  and  normal  random  numbers,  it  is 
desirable  that  some  empirical  tests  be  applied  to  these  pairs  of 
pseudo  random  matrices. 

Appendix  B  describes  some  tests  in  details  one  for  vector  Z, 
and  one  for  matrix  S,  These  tests  are  (optionally)  performed  by 
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MV  SIM  on  each  sample,  but  AAV  SIAA  takes  no  corrective  action  except 
to  print  out  the  value  of  the  computed  statistics  and  an  indication 
of  the  proper  distribution  to  be  compared  with  the  statistics. 
The  Sequential  Operation  of  Program  MV  SIM  is  as  followsg 
I,  Print  out  input  mean  vector  U,  and  V~C  matrix  2_.  a  and 
other  miscel laneous  data  identifying  the  computer  runo 
2o  Compute  the  P  matrix  from  Zj  as  described  aboveo  Op« 
tionally,  the  P  matrix  may  be  printed  outo 

3.  List  the  variance  of  y^,  C^„« 

'  p*^   pp 

i+.  Compute  the  prediction  equation  for  y  ,  for  each  combina=. 
tion  of  variables,  yi»<><>»*yn.|  ^^^^   ^'"®  specified  by  the 
program  user  as  input.  For  each  such  regression  the 
following  data  are  printed?: 

a)  regression  number 

b)  qri-l   variate  normal,   where  q   is   the  number  of   variables 
In  regression 

c)  multiple  correlation  coefficient   (squared) 

d)  conditional    variance  of   y   ^    C^      , 

p    ppo  I ,  CO  o  j,q' 

e)  the  regression  coef f icients,  Zj •  (optional) 

5«  Print  out  input  data  regarding  samples  of  the  specified 
distribution  as  described  and  illustrated  In  chapter  VH  1 
l,e,,  numbers  of  samples,  observation  costs,  whether 
•^standard"  or  "cost"  option  is  used,  etc. 

The  following  operations  are  performed  on  each  sample  specified! 

6,  Generate  the  required  sample  of  the  specified  p  variate 
no  rma I . 
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7»  Compute  samples  mean  vector  Z,  and  V-C  matrix^  So  PrSrot 

out  Z  and  S, 
8,  Test  sample  means,  Z  (optional)  (see  Appendix  B)o  Print 
out  eigenvectors  and  eigenvalues  of  matrixg  S^  from  which 
the  proper  statistic  is  computedo  Also  print  out  the 
statistic  and  the  proper  degrees  of  freedom  of  F  to  be 
used  for  comparison, 
9o  Test  sample  matrix,  S  (optional)  (see  Appendix  B)o 
Print  out  eigenvalues  of  sample  matrix^  So  Print  out 
the  statistic  to  be  compared  with  chi=squared  distro<= 
bution.  Also  print  out  proper  degrees  of  freedom  to  be 
used  for  comparisono 
Of  course,  the  user  of  program  MV  SIM  can  omit  some  of  the  above 
items  such  as  items  5  and  k   at  his  discretion. 

The  actual  analysis  of  each  sample  and  associated  printed  output 
performed  by  MV  REGRESSION  is  described  and  illustrated  In  detail  In 
chapter  IX, 
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Appendix  B 

TESTS  OF  SAMPLE  MEAN  VECTOR,  Z, 
AND  SAMPLE  VARI ANCE-COVARI ANCE  MATRIX,  S 


For  a  discussion  of  some  of  the  problems  encountered  in  generating 
random  numbers  by  arithmetic  means,  see  Barron  2  and  Vaa   19  • 

Graybill  \h\,   page  206,  shows  that  if  Y  is  a  p  variate  normal 
with  mean  vector  U  and  V-C  matrix  2_, »  then  the  quantity? 

V  =  (Z  -  U)   S''  (Z  -  U)  (n  -  p)  /  p  (  n  -  0, 

is  distributed  as  F,     v,  if  indeed  Z  and  S  are  computed  from  a 

(p,n-p)* 

sample  of  size  n  from  the  specified  p  variate  norma  !o  Hence^,  to  test 

a  sample  mean  vector,  Z,  an  appropriate  levels  CX,  (usuatly  «05)  is 

chosen.  Then  if  v  is  less  than  Fr\//  \,  vector  Z  is  accepted  as 

(JC(,p,n-p; 

having  been  computed   from  a    reasonable  samplej   otherwise  Z    Is   rejected » 

To  perform  a    test   for  a  sample  V-C  matrix,   Sj,   an  orthogonall 
transformation   is   performed  on  both   }_,  and  S,   separately,    yielding 
diagonal    matrices   A  and  D   respecti velyo     A     Is  a  V^C  matrix  of  a 
p  variate  normal   with    independent   variables    (ioeo,   a  II  I    covarlances 
are  equal    to  zero).     Now,    if    It    Is   true  that  S   is  computed   from  a 
sample  drawn   from  a   p  variate  normal   with  V»C  matrix,  2_,  »    then  D 
should  be  a  sample  drawn   from  a  p  variate  normal   with  V=.C  matrix.  A, 
Hence,   a   test   that  D   is  a  sample   from   A  should  verify   that  S   is  a 
sample  from  2_,  • 

Since  each  element  of   D,   s.,    (I    =    l,o.o,p).    Is  a  sample  variance, 
and  since  each  element  of  A  ,  0~. .  ^is   the   true  variance  corresponding 
to  element  s/.,    for  all    i,    intuitively,    it  appears   that  each  of   the 
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statlstlcst 

(s/j   /  C7|)    °    (n  -    1)      i    =    l,ooc,p   , 

should  have   the  chi-square  distribution  with  n  «    l    degrees  of    freedom 

(n    is  still    the  sample  size).     From  this,   and   the   fact   that  S| j    Ss 

statistical  ly   independent  of  s..   forall    i,    j  -  Ipooo^p   (i    ^  j)  s    '^ 

follows    that  the  statistics 

P 


(n  -   I)    Z.    (s(,  /  Cri) 


(Bl) 

i  =  l 


has    the  chi-square  distribution  with  p°(n~l)    degrees  of   freedom^, 
since   the  degrees  of    freedom  of   sums  of    independent  chi -squares  are 
addi  ti ve. 

Hence,    to  test   each  sample  V-C  matrix,    S,   MV  SIM  "rotates"  }_, 
and  S,   and  computes   formula  Bl    above   from    lA  and  D,,     Printed  out 
(optionally)    are  the  p  diagonal    elements  of   /\  and  D   (the  eigenvalues 
of  matrices   2_,   ^"^^  ^   respective  ly) «     Also  printed  are   the   result  of 
formula  Bl   and   the  number  of   degrees  of    freedom  of    the  chi-square 
distribution   to  be  used   for  comparison. 

Programs  MV  SIM  and  MV  REGRESSION  were  used   to   generate  and   test 
a  number  of  samples    from  two  different  p   variate  normallso     One  of 
these  normals    is  specified  by  4,1    and  i;»2   (five  variate  normal).     The 
other  distribution  was  an    18  variate  normal    that  was   very  close   to 
being  singular,      (Several   sets  of    rows  were  close   to  each  other  In 
value) . 

Six  sample  sizess      50,    100,    I50,   200,   25O,   and  3OO     were  studied 
of    the   five  variate  normal,   with  20  samples   tested  of   each  size. 
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Four  sample  sizes:      50*    100*    '50,   and  200  were  studied  of    the 
18  variate  normal,   with  20  samples    tested  of   each  sSzeo 

For  the   five  variate  normal,   both  statistics   (for  Z  and  S) 
appeared   to  behave  as  samples   from  their  respective  F   and  chi -squared 
distributions   for  all   sample  sizeSo 

However,   curious    results  were  obtained   from  the  unusual    18 
variate  normal    tested*     All   Z  tests   passed  as   nicely  as   for  the   five 
variate  normal.     However,    the  values  of  ch!-square  were  much   too 
high,    indicating  poor  sample  V-C  matrices,   S,   were  being   generatedo 
For  example,    for  the  20  samples  of   size    100   (of    the    18  variate  normaj) 
the  statistic  Bl   should  behave  as  chi-square  with    J782  degrees  of 
freedom   (which   is   the  mean  of   that   distribution).      The  20  computed 
values  of  Bl    ranged  from  2213   1"°  2683, 

A  possible  reason  for  these  poor  results  could  be  due  to  the 
use  of  a  poor  random  number  generator.  However,  the  satisfactory 
results  obtained  from  testing  the  five  variate  normaii,  as  well  as 
tests  of  the  uniform  random  number  generator  conducted  previously 
leads  one   to  seek   a   different  source  of   error. 

Possibly  a  more  reasonable  explanation  is  the  likelihood  of 
computer  round  off  error.  The  large  number  of  computations  required 
to  rotate  an  18  x  18  matrix  plus  the  fact  that  the  matrices  were  a  1  II 
nearly  singular  could  very  likely  cause  this  type  error.  If  this  is 
the  case,  the  generated  sample  V-C  matrices  themselves  may  be  "good" 
samples   that  are  merely  difficult    to  test. 

Another  interesting  possibility  is  the  method  used  to  rotate 
matrix  S   for  the   test.     Recall    that    rotating  a  symmetric  matrix,  2^  » 
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to  yield  a  diagonal  matrix.  A,  can  always  be  done  by  finding  an 
orthogonal  matrix,  R.,  so  that  the  following  is  satisfied? 

(B2)  R|  '  E*  R|  =  A- 

Also  since  S  is  also  symmetric  Ro  can  be  found  so  that 

Rg  '  S  '  Rg  -  D  , 

where  D  is  diagonal.  Since  2-.  and  S  are  not  exactly  equal  it  foHows 
that  orthogonal  matrices  R.  and  Rp  will  not  be  equal. 

Perhaps  one  might  argue  that  a  "better"  test  might  be  to  find 
Ri  from  the  rotation  of  2_.  »  B2  above,  and  then  computes 

R,  •  S  •  R,   =  D 

where  D  should  be  nearly  diagonal  if  S  is  a  reasonable  sample  from 
2j  I  then  compare  the  diagonal  elements  of  D  and  A  as  described  abov« 
for  D  and  A  • 
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